Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,210 183 Updated Dec 26, 2024

microsoft / VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 574 39 Updated Jan 21, 2025

GATECH-EIC / mg-verilog

Python 35 6 Updated Oct 8, 2024

bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 213 21 Updated Sep 30, 2024

Xtra-Computing / PyOE

Python library for data stream learning

Python 28 Updated Sep 11, 2024

rapidstream-org / rapidstream-tapa

RapidStream TAPA compiles task-parallel HLS program into high-frequency FPGA accelerators.

C++ 163 34 Updated Feb 2, 2025

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

C++ 659 49 Updated Jan 9, 2025

state-spaces / mamba

Mamba SSM architecture

Python 13,872 1,193 Updated Jan 18, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 8,321 814 Updated Feb 2, 2025

template-hls / template-hls-float

C++ 27 2 Updated Apr 26, 2019

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 292 25 Updated Jul 2, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 72,732 10,479 Updated Feb 2, 2025

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,920 235 Updated Jan 20, 2025

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 340 70 Updated Sep 8, 2024

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 287 43 Updated Feb 1, 2025

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 36,481 4,223 Updated Feb 2, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,071 418 Updated Jan 28, 2025

LlamaFamily / Llama-Chinese

Llama中文社区，Llama3在线体验和微调模型已开放，实时汇总最新Llama3学习资料，已将所有代码更新适配Llama3，构建最好的中文Llama大模型，完全开源可商用

Python 14,385 1,287 Updated Sep 5, 2024

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 384 46 Updated Sep 11, 2024

definelicht / hlslib

A collection of extensions for Vitis and Intel FPGA OpenCL to improve developer quality of life.

C++ 312 58 Updated Jan 20, 2025

ridgerchu / matmulfreellm

Implementation for MatMul-free LM.

Python 2,957 187 Updated Nov 5, 2024

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 32,679 3,724 Updated Jan 31, 2025

datawhalechina / self-llm

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调（全参数/Lora）、部署国内外开源大模型（LLM）/多模态大模型（MLLM）教程

Jupyter Notebook 11,845 1,347 Updated Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YuFeng KoalaYuFeng

Highlights

Organizations

Block or report KoalaYuFeng

Stars

deepseek-ai / DeepSeek-V3

Xilinx / xacc

Xilinx / XilinxBoardStore

EleutherAI / lm-evaluation-harness

X-DPU / .github

abdelfattah-lab / BitMoD-HPCA-25

TUM-DSE / vFPIO

Vahe1994 / AQLM