Note: This repository is currently under heavy development - if you have suggestions on the API or use-cases you'd like to be covered, please open an github issue
torchao is a PyTorch native library for optimizing your models using lower precision dtypes, techniques like quantization and sparsity and performant kernels.
To try out our APIs, you can check out API examples in quantization (including autoquant
), sparsity, dtypes.
Note: this library makes liberal use of several new features in pytorch, its recommended to use it with the current nightly or latest stable version of PyTorch.
- From PyPI:
pip install torchao
- From Source:
git clone https://github.com/pytorch-labs/ao
cd ao
pip install -e .
The library provides
- Support for lower precision dtypes such as nf4, uint4 that are torch.compile friendly
- Quantization algorithms such as dynamic quant, smoothquant, GPTQ that run on CPU/GPU and Mobile.
- Int8 dynamic activation quantization
- Int8 and int4 weight-only quantization
- Int8 dynamic activation quantization with int4 weight quantization
- GPTQ and Smoothquant
- High level
autoquant
API and kernel auto tuner targeting SOTA performance across varying model shapes on consumer/enterprise GPUs.
- Sparsity algorithms such as Wanda that help improve accuracy of sparse networks
- Integration with other PyTorch native libraries like torchtune and ExecuTorch
torchao embodies PyTorch’s design philosophy details, especially "usability over everything else". Our vision for this repository is the following:
- Composability: Native solutions for optimization techniques that compose with both
torch.compile
andFSDP
- For example, for QLoRA for new dtypes support
- Interoperability: Work with the rest of the PyTorch ecosystem such as torchtune, gpt-fast and ExecuTorch
- Transparent Benchmarks: Regularly run performance benchmarking of our APIs across a suite of Torchbench models and across hardware backends
- Heterogeneous Hardware: Efficient kernels that can run on CPU/GPU based server (w/ torch.compile) and mobile backends (w/ ExecuTorch).
- Infrastructure Support: Release packaging solution for kernels and a CI/CD setup that runs these kernels on different backends.
torchao has been integrated with other repositories to ease usage
- torchtune is integrated with 8 and 4 bit weight-only quantization techniques with and without GPTQ.
- Executorch is integrated with GPTQ for both 8da4w (int8 dynamic activation, with int4 weight) and int4 weight only quantization.
Our kernels have has been used to achieve SOTA inference performance on
torchao
is released under the BSD 3 license.