lippman1125

Follow

Alexander lippman1125

Follow

74 followers · 25 following

Achievements

Achievements

Stars

microsoft / VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 543 36 Updated Dec 15, 2024

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 835 39 Updated Dec 16, 2024

OpenGVLab / EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 232 18 Updated Oct 8, 2024

yxli2123 / LoftQ

Python 205 19 Updated Jun 11, 2024

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 218 16 Updated Oct 28, 2024

Dao-AILab / fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

C 119 17 Updated May 24, 2024

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 298 25 Updated Nov 26, 2024

xai-org / grok-1

Grok open release

Python 49,728 8,342 Updated Aug 30, 2024

Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,184 180 Updated Nov 28, 2024

mobiusml / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Python 716 72 Updated Nov 22, 2024

pratyushasharma / laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Python 375 28 Updated Jul 9, 2024

mistralai / mistral-inference

Official inference library for Mistral models

Jupyter Notebook 9,800 869 Updated Nov 12, 2024

mosaicml / composer

Supercharge Your Model Training

Python 5,190 423 Updated Dec 16, 2024

GeneZC / MiniMA

Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"

Python 95 5 Updated Jul 9, 2024

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,018 475 Updated May 3, 2024

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,329 94 Updated Dec 9, 2024

Azure / MS-AMP

Microsoft Automatic Mixed Precision Library

Python 528 42 Updated Sep 29, 2024

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 737 56 Updated Oct 8, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 14,645 1,374 Updated Dec 15, 2024

Cornell-RelaxML / QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 354 32 Updated Feb 24, 2024

IntelLabs / FP8-Emulation-Toolkit

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.

Python 101 10 Updated Dec 3, 2024

xijiu9 / Train_Transformers_with_INT4

Python 135 4 Updated Jun 22, 2023

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 4,846 438 Updated Dec 16, 2024

kyegomez / AttentionIsOFFByOne

Implementation of "Attention Is Off By One" by Evan Miller

Python 186 10 Updated Aug 28, 2023

facebookresearch / LLM-QAT

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Python 259 24 Updated Sep 3, 2024

lightmatter-ai / INT-FP-QSim

Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.

Python 47 4 Updated Jul 10, 2023

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,251 82 Updated Dec 11, 2024

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,964 156 Updated Mar 27, 2024

fpgaminer / GPTQ-triton

GPTQ inference Triton kernel

Jupyter Notebook 286 22 Updated May 18, 2023

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python 3,015 460 Updated Jul 13, 2024