Skip to content
View Ming0310's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Ming0310

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICCV 2023] Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks

Python 23 Updated Oct 31, 2023

Python implementation of REMBO built on GPyTorch.

Python 17 2 Updated Jul 11, 2020
Python 6 Updated Jan 27, 2025

My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Python 31 2 Updated Aug 14, 2024

A framework for few-shot evaluation of language models.

Python 8,103 2,166 Updated Mar 5, 2025

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 362 33 Updated Feb 24, 2024

[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.

Python 233 21 Updated Oct 30, 2024

Code Repository of Evaluating Quantized Large Language Models

Python 116 6 Updated Sep 8, 2024
Python 115 5 Updated Feb 15, 2025

GPU operators for sparse tensor operations

Python 30 1 Updated Mar 11, 2024

The best way to write secure and reliable applications. Write nothing; deploy nowhere.

Dockerfile 61,670 4,723 Updated Aug 7, 2024

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 221 29 Updated Feb 14, 2025

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Python 102 15 Updated May 16, 2024

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).

C++ 2,110 859 Updated Feb 25, 2025

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 351 32 Updated Nov 26, 2024

一本 GPT4 生成的单词书📚,超过 8000 个单词分析,涵盖了词义、例句、词根词缀、变形、文化背景、记忆技巧和小故事

HTML 4,079 270 Updated Oct 14, 2024

深度学习面试宝典(含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向)

7,970 1,345 Updated Apr 24, 2024

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 277 27 Updated Jan 19, 2025

QAQ: Quality Adaptive Quantization for LLM KV Cache

Python 47 7 Updated Mar 27, 2024

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 335 30 Updated Aug 13, 2024

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 7,049 1,945 Updated Mar 4, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 42,968 5,251 Updated Mar 3, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,692 2,382 Updated Aug 12, 2024

Official PyTorch implementation of QA-LoRA

Python 127 11 Updated Mar 13, 2024

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Python 206 14 Updated Jan 11, 2025

List of papers related to neural network quantization in recent AI conferences and journals.

544 45 Updated Dec 16, 2024

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Python 1,765 159 Updated Jan 27, 2025

For releasing code related to compression methods for transformers, accompanying our publications

Python 411 48 Updated Jan 16, 2025

⏰ AI conference deadline countdowns

JavaScript 5,781 1,000 Updated Sep 15, 2024
Next