Skip to content
View Novemser's full-sized avatar
🏳️
Glad to GG
🏳️
Glad to GG

Block or report Novemser

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Python 908 106 Updated Oct 7, 2024

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 742 56 Updated Oct 8, 2024

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 232 25 Updated Sep 30, 2024

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 746 98 Updated Aug 20, 2024

[NeurIPS 2023] "The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter", Ajay Jaiswal, Shiwei Liu, Tianlong Chen, and Zhangyang Wang

Python 8 Updated Jul 23, 2023

PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity

Cuda 101 27 Updated Dec 16, 2024
Cuda 6 Updated Dec 3, 2021
Python 57 12 Updated Oct 23, 2024

Code for the NeurIPS 2023 paper: "ZipLM: Inference-Aware Structured Pruning of Language Models".

2 Updated Oct 20, 2023

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX

Jupyter Notebook 2,349 431 Updated Nov 20, 2024

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,016 124 Updated Apr 17, 2024

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 15,075 2,965 Updated Dec 23, 2024
Python 87 8 Updated Dec 10, 2024

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,841 222 Updated Dec 13, 2024

cuML - RAPIDS Machine Learning Library

C++ 4,303 539 Updated Dec 21, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,970 156 Updated Mar 27, 2024

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Python 92 12 Updated Oct 15, 2024

A SQL query engine in Go

Go 151 5 Updated Oct 8, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 466 28 Updated Nov 9, 2024

This repository contains integer operators on GPUs for PyTorch.

Python 186 50 Updated Sep 29, 2023

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 275 22 Updated Nov 10, 2024

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,294 151 Updated Jul 12, 2024

Image forgery recognition algorithm

Python 594 75 Updated Sep 9, 2024
Python 3 Updated Oct 19, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,606 215 Updated Dec 20, 2024

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Python 26 1 Updated Aug 9, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,193 180 Updated Dec 21, 2024

Charlie Mnemonic: The First Personal Assistant with Long-Term Memory

Python 175 21 Updated Oct 16, 2024

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 46,074 5,477 Updated Dec 18, 2024

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 20,881 2,045 Updated Dec 20, 2024
Next