torchao: PyTorch Architecture Optimization

Note: This repository is currently under heavy development - if you have suggestions on the API or use-cases you'd like to be covered, please open an github issue

Introduction

torchao is a PyTorch native library for optimizing your models using lower precision dtypes, techniques like quantization and sparsity and performant kernels.

Get Started

To try out our APIs, you can check out API examples in quantization (including autoquant), sparsity, dtypes.

Installation

Note: this library makes liberal use of several new features in pytorch, its recommended to use it with the current nightly or latest stable version of PyTorch.

From PyPI:

pip install torchao

From Source:

git clone https://github.com/pytorch-labs/ao
cd ao
pip install -e .

Key Features

The library provides

Support for lower precision dtypes such as nf4, uint4 that are torch.compile friendly
Quantization algorithms such as dynamic quant, smoothquant, GPTQ that run on CPU/GPU and Mobile.

Int8 dynamic activation quantization
Int8 and int4 weight-only quantization
Int8 dynamic activation quantization with int4 weight quantization
GPTQ and Smoothquant
High level autoquant API and kernel auto tuner targeting SOTA performance across varying model shapes on consumer/enterprise GPUs.

Sparsity algorithms such as Wanda that help improve accuracy of sparse networks
Integration with other PyTorch native libraries like torchtune and ExecuTorch

Our Goals

torchao embodies PyTorch’s design philosophy details, especially "usability over everything else". Our vision for this repository is the following:

Composability: Native solutions for optimization techniques that compose with both torch.compile and FSDP
- For example, for QLoRA for new dtypes support
Interoperability: Work with the rest of the PyTorch ecosystem such as torchtune, gpt-fast and ExecuTorch
Transparent Benchmarks: Regularly run performance benchmarking of our APIs across a suite of Torchbench models and across hardware backends
Heterogeneous Hardware: Efficient kernels that can run on CPU/GPU based server (w/ torch.compile) and mobile backends (w/ ExecuTorch).
Infrastructure Support: Release packaging solution for kernels and a CI/CD setup that runs these kernels on different backends.

Interoperability with PyTorch Libraries

torchao has been integrated with other repositories to ease usage

torchtune is integrated with 8 and 4 bit weight-only quantization techniques with and without GPTQ.
Executorch is integrated with GPTQ for both 8da4w (int8 dynamic activation, with int4 weight) and int4 weight only quantization.

Success stories

Our kernels have has been used to achieve SOTA inference performance on

Image segmentation models with sam-fast
Language models with gpt-fast
Diffusion models with sd-fast

License

torchao is released under the BSD 3 license.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
test		test
torchao		torchao
tutorials		tutorials
.gitignore		.gitignore
.lintrunner.toml		.lintrunner.toml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dev-requirements.txt		dev-requirements.txt
requirements-lintrunner.txt		requirements-lintrunner.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torchao: PyTorch Architecture Optimization

Introduction

Get Started

Installation

Key Features

Our Goals

Interoperability with PyTorch Libraries

Success stories

License

About

Releases

Packages

Languages

License

aakashapoorv/ao

Folders and files

Latest commit

History

Repository files navigation

torchao: PyTorch Architecture Optimization

Introduction

Get Started

Installation

Key Features

Our Goals

Interoperability with PyTorch Libraries

Success stories

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages