Lists (1)
Sort Name ascending (A-Z)
Stars
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
End-To-End SpeechSynthesis system with knowledge distillation
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
Official implementation of Half-Quadratic Quantization (HQQ)
[ICML 2024] CLLMs: Consistency Large Language Models
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
[ICLR 2024] The Need for Speed: Pruning Transformers with One Recipe
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
A fast inference library for running LLMs locally on modern consumer-class GPUs
Simple implementation of Speculative Sampling in NumPy for GPT-2.
A curated list for Efficient Large Language Models
Implementation of the 2023 CVPR Award Candidate: On Distillation of Guided Diffusion Models
A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
General technology for enabling AI capabilities w/ LLMs and MLLMs
Universal LLM Deployment Engine with ML Compilation