Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,211 183 Updated Dec 26, 2024

turboderp-org / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 3,928 299 Updated Feb 8, 2025

LargeWorldModel / LWM

Large World Model -- Modeling Text and Video with Millions Context

Python 7,221 555 Updated Oct 19, 2024

ray-project / llmperf-leaderboard

446 13 Updated Jan 10, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,959 197 Updated Feb 8, 2025

microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,850 271 Updated Jan 26, 2025

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 292 25 Updated Jul 2, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,082 420 Updated Jan 28, 2025

nbasyl / LLM-FP4

The official implementation of the EMNLP 2023 paper LLM-FP4

Python 178 14 Updated Dec 15, 2023

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Python 3,094 181 Updated Jul 19, 2024

THUDM / ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

Python 13,616 1,592 Updated Jan 13, 2025

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,186 72 Updated Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yzl-eup

Block or report yzl-eup

Stars

simplescaling / s1

aredden / torch-bnb-fp4

deepseek-ai / DeepSeek-R1

luhengshiwo / LLMForEverybody

alibaba / rtp-llm

sgl-project / sglang

NVIDIA / kvpress

facebookresearch / SpinQuant

bytedance / AffineQuant

spcl / QuaRot

ruikangliu / FlatQuant

madsys-dev / deepseekv2-profile

mit-han-lab / Quest

pytorch-labs / applied-ai

triton-lang / triton

SqueezeBits / QUICK

liyucheng09 / Selective_Context

microsoft / unilm

Vahe1994 / AQLM