Stars
MagicPIG: LSH Sampling for Efficient LLM Generation
Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.
FlashInfer: Kernel Library for LLM Serving
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
scalable and robust tree-based speculative decoding algorithm
Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"
MSCCL++: A GPU-driven communication stack for scalable AI applications
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training