Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
Colormaps is a library of collection of colormaps or color palettes for Python.
A modern cookiecutter template for Python projects that use uv for dependency management
✂️ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) models
Curated list of datasets and tools for post-training.
cuVS - a library for vector search and clustering on the GPU
Lightweight Nearest Neighbors with Flexible Backends
Automatic Generation of Visualizations and Infographics using Large Language Models
The book every data scientist needs on their desk.
Delta Chat Rust Core library, used by Android/iOS/desktop apps, bindings and bots 📧
Muon optimizer: +>30% sample efficiency with <3% wallclock overhead
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
Everything about the SmolLM2 and SmolVLM family of models
Official implementation of "GPT or BERT: why not both?"
Awesome Active Learning Paper List
Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models (EMNLP 2024)
Collect optimizer related papers, data, repositories
Blazingly fast cleaning swear words (and their leetspeak) in strings
A curated list for awesome discrete diffusion models resources.
The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.
Software for humanities scholars using quantitative or computational methods.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
git-bob uses AI to solve Github-issues. It runs inside the Github CI, no need to install anything on your computer.