summaryrefslogtreecommitdiff
path: root/public/crt-sh-architecture.md
blob: d518d2ff7d959688043dcf6baa2a3a8896a03b9d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
The interesting architecture of crt.sh
======================================
---
date: "2018-02-09"
---

A while back I wrote myself a little dashboard for monitoring TLS
certificates for my domains.  Right now it works by talking to
<https://crt.sh/>.  Sometimes this works great, but sometimes crt.sh
is really slow.  Plus, it's another thing that could be compromised.

So, I started looking at how crt.sh works.  It's kinda cool.

There are only 3 separate processes:

 - Cron
   - [`ct_monitor`](https://github.com/crtsh/ct_monitor) is program
     that uses libcurl to get CT log changes and libpq to put them
     into the database.
 - PostgreSQL
   - [`certwatch_db`](https://github.com/crtsh/certwatch_db) is the
     core web application, written in PL/pgSQL.  It even includes the
     HTML templating and query parameter handling.  Of course, there
     are a couple of things not entirely done in pgSQL...
   - [`libx509pq`](https://github.com/crtsh/libx509pq) adds a set of
     `x509_*` functions callable from pgSQL for parsing X509
     certificates.
   - [`libcablintpq`](https://github.com/crtsh/libcablintpq) adds the
     `cablint_embedded(bytea)` function to pgSQL.
   - [`libx509lintpq`](https://github.com/crtsh/libx509lintpq) adds the
     `x509lint_embedded(bytea,integer)` function to pgSQL.
 - Apache HTTPD
   - [`mod_certwatch`](https://github.com/crtsh/mod_certwatch) is a
     pretty thin wrapper that turns every HTTP request into an SQL
     statement sent to PostgreSQL, via...
   - [`mod_pgconn`](https://github.com/crtsh/mod_pgconn), which
     manages PostgreSQL connections.

The interface exposes HTML, ATOM, and JSON.  All from code written in
SQL.

And then I guess it's behind an nginx-based load-balancer or somesuch
(based on the 504 Gateway Timout messages it's given me).  But that's
not interesting.

The actual website is [run from a read-only slave][slave-post] of the
master DB that the `ct_monitor` cron-job updates; which makes several
security considerations go away, and makes horizontal scaling easy.

[slave-post]: https://groups.google.com/d/msg/mozilla.dev.security.policy/EPv_u9V06n0/gPJY5T7ILlQJ

Anyway, I thought it was neat that so much of it runs inside the
database; you don't see that terribly often.  I also thought the
little shims to make that possible were neat.  I didn't get deep
enough in to it to end up running my own instance or clone, but I
thought my notes on it were worth sharing.