Starred repositories
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
A debian-based shell environment designed for Android and adb
BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
llama and other large language models on iOS and MacOS offline using GGML library.
A GPU accelerated error-bounded lossy compression for scientific data.
Error-bounded Lossy Data Compressor (for floating-point/integer datasets)
[NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
Cross-platform, customizable ML solutions for live and streaming media.
Automated upstream mirror for libbpf stand-alone build.
Minimal and opinionated eBPF tooling for the Rust ecosystem
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
On-device AI across mobile, embedded and edge for PyTorch
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.