Stars
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
MoBA: Mixture of Block Attention for Long-Context LLMs
A PyTorch native library for large model training
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding code links.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A method for calculating scaling laws for LLMs from publicly available models
Modeling, training, eval, and inference code for OLMo
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Official inference repo for FLUX.1 models
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
A description for recent long-context large language model Jamba.
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
Some preliminary explorations of Mamba's context scaling.
Doing simple retrieval from LLM models at various context lengths to measure accuracy
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Example UI implementing the RTVI web client
recursal / GoldFinch-paper
Forked from SmerkyG/GoldFinch-paperGoldFinch and other hybrid transformer components
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
A simple and easily understandable version of RWKV
BlinkDL / nanoRWKV
Forked from karpathy/nanoGPTRWKV in nanoGPT style
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )