Stars
Implementation of the ECPP algorithm by Atkin and Morain
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Locality Sensitive Hashing for Go (Multi-probe LSH, LSH Forest, basic LSH)
sensible.vim: Defaults everyone can agree on
A simple, in-browser, markdown-driven slideshow tool.
I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this
Default English stopword lists from many different sources
Jekyll based framework for minimal exhibitions with IIIF 🐝
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
Bash script to download mp3s from the OverDrive audiobook service
From the basics to slightly more interesting applications of Tensorflow
Modeling free indirect discourse in literature, using AI.
Utility tasks for processing collection data with Wax 🐝
Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.
Scripts to sort, access, and analyze the ToI archive
⚫ A spaCy pipeline and model for NLP on unstructured legal text.
Humanities Entity Recognition: robust, practical, efficient Named Entity Recognition for today's digital humanist
Django web application to display, annotate, and export digitized books.
Princeton-CDH / django-annotator-store
Forked from ecds/readuxDjango application to act as an annotator.js 2.x annotator-store backend
A selection of 28 classic British novels from the 19th century (including a few late 18th-century items). Full text versions, in plain text format, harvested from trustworthy public domain sites.
A reverse proxy that provides authentication with Google, Azure, OpenID Connect and many more identity providers.