Stars
💫 Industrial-strength Natural Language Processing (NLP) in Python
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
Low-code framework for building custom LLMs, neural networks, and other AI models
Skulpt is a Javascript implementation of the Python programming language
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
A Python nearest neighbor descent for approximate nearest neighbors
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
⚫ A spaCy pipeline and model for NLP on unstructured legal text.
Default English stopword lists from many different sources
A Python framework for creating interactive Twitter bots
I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this
Python/Flask-based website for text analysis workflow. Previous (stable) release is live at:
Humanities Entity Recognition: robust, practical, efficient Named Entity Recognition for today's digital humanist
Django web application to display, annotate, and export digitized books.
Jupyter notebook extension for exporting notebook as MS Word doc
A DH box for Miriam Posner and Ben Schmidt's 2016 workshops in Bethesda
Implementation of the ECPP algorithm by Atkin and Morain
Scripts to sort, access, and analyze the ToI archive