- China
Stars
- All languages
- Assembly
- C
- C#
- C++
- CMake
- CSS
- Classic ASP
- Clojure
- CoffeeScript
- Coq
- Cuda
- Elixir
- F#
- Go
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Jupyter Notebook
- Kotlin
- LLVM
- Lean
- Lua
- OCaml
- Objective-C++
- OpenEdge ABL
- P4
- Perl
- Python
- R
- Ruby
- Rust
- SCSS
- SMT
- Scala
- Scheme
- Shell
- Swift
- TLA
- TSQL
- TeX
- TypeScript
- UrWeb
- Vim Script
- Vue
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
[NeurIPS 2023] "The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter", Ajay Jaiswal, Shiwei Liu, Tianlong Chen, and Zhangyang Wang
PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity
Code for the NeurIPS 2023 paper: "ZipLM: Inference-Aware Structured Pruning of Language Models".
Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
This repository contains integer operators on GPUs for PyTorch.
Model Compression Toolbox for Large Language Models and Diffusion Models
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Image forgery recognition algorithm
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
Charlie Mnemonic: The First Personal Assistant with Long-Term Memory
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
A modular graph-based Retrieval-Augmented Generation (RAG) system