Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).

C++ 2,110 859 Updated Feb 25, 2025

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 351 32 Updated Nov 26, 2024

Ceelog / DictionaryByGPT4

一本 GPT4 生成的单词书📚，超过 8000 个单词分析，涵盖了词义、例句、词根词缀、变形、文化背景、记忆技巧和小故事

HTML 4,079 270 Updated Oct 14, 2024

amusi / Deep-Learning-Interview-Book

深度学习面试宝典（含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向）

7,970 1,345 Updated Apr 24, 2024

jy-yuan / KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 277 27 Updated Jan 19, 2025

ClubieDong / QAQ-KVCacheQuantization

QAQ: Quality Adaptive Quantization for LLM KV Cache

Python 47 7 Updated Mar 27, 2024

SqueezeAILab / KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 335 30 Updated Aug 13, 2024

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 7,049 1,945 Updated Mar 4, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 42,968 5,251 Updated Mar 3, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,692 2,382 Updated Aug 12, 2024

yuhuixu1993 / qa-lora

Official PyTorch implementation of QA-LoRA

Python 127 11 Updated Mar 13, 2024

Aaronhuang-778 / BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Python 206 14 Updated Jan 11, 2025

Zhen-Dong / Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

544 45 Updated Dec 16, 2024

kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Python 1,765 159 Updated Jan 27, 2025

microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications

Python 411 48 Updated Jan 16, 2025

paperswithcode / ai-deadlines

⏰ AI conference deadline countdowns

JavaScript 5,781 1,000 Updated Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ming Ming0310

Block or report Ming0310

Stars

Akimoto-Cris / RD_PRUNE

hvarfner / vanilla_bo_in_highdim

rees-c / PyREMBO

zjq0455 / PTQ1.61

nanowell / Q-Sparse-LLM

EleutherAI / lm-evaluation-harness

Cornell-RelaxML / QuIP

OpenBMB / UltraEval

thu-nics / qllm-eval

FasterDecoding / TEAL

Raincleared-Song / sparse_gpu_operator

kelseyhightower / nocode

facebookresearch / SpinQuant

DD-DuDa / BitDistiller

tensorflow / tflite-micro