This collection of tools is designed to assemble a cascading bloom filter containing all TLS certificate revocations, as described in this CRLite paper.
These tools were built from scratch, using the original CRLite research code as a design reference and closely following the documentation in their paper.
- A Censys Researcher Account (for downloading certificates)
- About 3 terabytes of space to store certificates and associated data
- Node
- Python 2 & Python 3 (default is to use Python 2 except when explicitly noted)
- Aria2c (or wget or Curl)
- pyopenssl (at least version 16.1.0)
- Lots of patience, as many of the scripts take several hours even with multiprocessing
- After obtaining a researcher account on Censys, perform the following Data export query to collect all valid NSS-trusted certificates. Be sure to request the results in JSON format, and select the "nested" option to prevent collisions when flattening the data entries. The compression option is also recommended (see screenshot below). If you just want to try the tools on a small sample subset of certificates (no Censys account required), use this file instead and skip to step 3.
SELECT parsed.*
FROM certificates.certificates
WHERE validation.nss.valid = TRUE
-
Download the exported certificates, which will be provided in several hundred files. The recommended method is to copy-paste the provided download URLs into a file on your target machine, then use
wget -i URL_FILE
to download all of the certificate files. -
Unzip the certificate files and place their contents in a single, unified file. Unzip with
gzip -u *.gz
, then unify the files withcat *.json > certificates.json
. You can then delete all files except forcertificates.json
. (If you're using the sample file, then just unzip it and rename it ascertificates.json
).
-
Set
get_CRL_revocations
as the working directory. This folder contains all scripts for Part B. -
Extract the CRL distribution points by running
python extract_crls.py
. This script will output three files: a file of all certificates which have listed CRLs(../certs_using_crl.json
), a file of all certificates which do not list a CRL(../certs_without_crl.json
), and a list of all CRL distribution points (CRL_servers
). -
Sort and eliminate duplicate entries in
CRL_servers
using the commandsort -u CRL_servers > CRL_servers_final
. You can compare yourCRL_servers_final
to my reference CRL list to see that the replication results are similar up to this point. -
Download all of the CRLs listed in
CRL_servers_final
. First create a new subdirectoryraw_CRLs
, set it as the working directory, then runaria2c -i ../CRL_servers_final -j 16
. -
Set the working directory back one level up (to
get_CRL_revocations
again). Create a catalogue, or "megaCRL," of all revocations withpython3 build_megaCRL.py
script (note that this must use python3 and pyopenssl version 16.1.0 and above). This will outputmegaCRL
, which contains all revocation serial numbers organized by CRL. -
Use
python count_serials.py
to see the total number of revocation serials that are contained in the megaCRL. You can compare your results against mine by using the same script on my reference megaCRL file. -
Make a new subdirectory
revokedCRLCerts
, then match the revocation serial numbers to known certificates usingpython build_CRL_revoked.py
. This script uses multiprocessing to get around the I/O bottleneck, and you may need to adjust the number of "worker" processes to get optimal speed on your machine. Each worker has a dedicated output file, so after the script you will need to combine each output file into a single, final result usingcat revokedCRLCerts/certs* > ../final_CRL_revoked.json
. -
Count the number of actual revoked certificates using
wc -l final_CRL_revoked.json
.
-
Set
get_OCSP_revocations
as the working directory. This folder contains all scripts for Part C. Make a subdirectory calledOCSP_revoked
. -
Use
python build_OCSP_revoked.py
to determine all Let's Encrypt revocations. This tooling replicates the process of the CRLite authors, and I believe they made this design choice to only include OCSP for Let's Encrypt based off the statistic that the vast majority of OCSP-only certificates are issued by them. After the script completes, combine the results of each worker into a final output file withcat OCSP_revoked/certs* > ../final_OCSP_revoked.json
.
-
Set
build_filter
as the working directory. This folder contains all scripts for Part D. Make subdirectoriesfinal_unrevoked
andfinal_revoked
. -
Use
python build_final_sets.py
to convert the data created from the steps above into a single set of all revoked certificates and all valid certificates. This script uses multiprocessing, so after running the script you will need to usecat final_unrevoked/*.json > ../final_unrevoked.json
andcat final_revoked/*.json > ../final_revoked.json
to combine the results of the individual workers into a single file. You can see how your results match against mine by comparing against this file. -
Use the command
node ./build_filer.js --max_old_space_size=32768 > filter
to assemble the final filter. Be sure to change theREVOKED
andUNREVOKED
constants to reflect accurately. (acknowledgements to James Larisch for the build_filter.js code)
Thanks to Eric Rescorla, J.C. Jones, James Larisch, the CRLite research team and the Mozilla Cryptography Engineering team.