Starred repositories
Running large language models on a single GPU for throughput-oriented scenarios.
My learning notes/codes for ML SYS.
how to optimize some algorithm in cuda.
ASCII generator (image to text, image to image, video to video)
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
Making large AI models cheaper, faster and more accessible
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
SGLang is a fast serving framework for large language models and vision language models.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
NoSQL data store using the SEASTAR framework, compatible with Redis
NoSQL data store using the seastar framework, compatible with Apache Cassandra
A high-throughput and memory-efficient inference and serving engine for LLMs
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A collection of modern/faster/saner alternatives to common unix commands.
KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
[SIGMOD 2023] High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations