Stars
verl: Volcano Engine Reinforcement Learning for LLMs
🔥 A minimal training framework for scaling FLA models
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
APOLLO: SGD-like Memory, AdamW-level Performance
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
The Prodigy optimizer and its variants for training neural networks.
"MiniRAG: Making RAG Simpler with Small and Free Language Models"
Prodigy and ScheduleFree, together at last.
Scalable RL solution for advanced reasoning of language models
Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"
Efficient Triton Kernels for LLM Training
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
An extremely fast Python package and project manager, written in Rust.
how to optimize some algorithm in cuda.
SGLang is a fast serving framework for large language models and vision language models.
Schedule-Free Optimization in PyTorch
CLiB中文大模型能力评测榜单(持续更新):目前已囊括195个大模型,覆盖chatgpt、gpt-4o、o3-mini、谷歌gemini、Claude3.5、智谱GLM-Zero、文心一言、qwen-max、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及DeepSeek-R1、deepseek-v3、qwen2.5、llama3.3、phi-4、glm4、书生int…
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
FlashInfer: Kernel Library for LLM Serving
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
AI chat and search for text, news, images and videos using the DuckDuckGo.com search engine.