🥰
Highlights
- Pro
Stars
7
stars
written in Cuda
Clear filter
FlashInfer: Kernel Library for LLM Serving
A throughput-oriented high-performance serving framework for LLMs
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving