Stars
AI infra
6 repositories
SGLang is a fast serving framework for large language models and vision language models.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Fast and memory-efficient exact attention
Development repository for the Triton language and compiler
A high-throughput and memory-efficient inference and serving engine for LLMs