Stars
Agent Framework / shim to use Pydantic with LLMs
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
A programming framework for agentic AI 🤖 (PyPi: autogen-agentchat)
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
🦜🔗 Build context-aware reasoning applications
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphic…
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
The Open Source Feature Store for Machine Learning
Collection of papers and resources for data augmentation for NLP.
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.
Tevatron - A flexible toolkit for neural retrieval research and development.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Allows you to maintain all the necessary cruft for packaging and building projects separate from the code you intentionally write. Built on-top of, and fully compatible with, CookieCutter.
Generic template to bootstrap your PyTorch project.
State-of-the-Art Text Embeddings
Implementation of a Transformer, but completely in Triton
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
Hydra is a framework for elegantly configuring complex applications
Data & AI Notebook templates catalog organized by tools, following the IMO (input, model, output) framework for easy usage and discovery..
A python library for user-friendly forecasting and anomaly detection on time series.
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).