Stars
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questio…
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Official implementation of the paper: "A deeper look at depth pruning of LLMs"
A sparse attention kernel supporting mix sparse patterns
Helpful tools and examples for working with flex-attention
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Unified KV Cache Compression Methods for Auto-Regressive Models
[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
SGLang is a fast serving framework for large language models and vision language models.
FlashInfer: Kernel Library for LLM Serving
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
An extremely fast Python package and project manager, written in Rust.
A PyTorch native library for large model training
[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Large World Model -- Modeling Text and Video with Millions Context
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
Efficient Triton Kernels for LLM Training
A curated list of awesome open-source libraries for production LLM
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
MINT-1T: A one trillion token multimodal interleaved dataset.
Run PyTorch LLMs locally on servers, desktop and mobile