FasterViT: Fast Vision Transformers with Hierarchical Attention

Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention.

Ali Hatamizadeh, Greg Heinrich, Hongxu (Danny) Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

FasterViT achieves a new SOTA Pareto-front in terms of accuracy vs. image throughput (no extra training data !)

We introduce a new self-attention mechanism, denoted as Hierarchical Attention (HAT), that captures both short and long-range information by learning cross-window carrier tokens.

💥 News 💥

[06.09.2023] 🔥🔥 We have released source code and ImageNet-1K FasterViT-models !

Catalog

ImageNet-1K training code
ImageNet-1K pre-trained models
ImageNet-21K pre-trained models
ImageNet-21K fine-tune scripts
Any-resolution FasterViT
Detection code (DINO) + models
Segmentation code + models

Results + Pretrained Models

ImageNet-1K

FasterViT ImageNet-1K Pretrained Models

Name	Acc@1(%)	Acc@5(%)	Throughput(Img/Sec)	Resolution	#Params(M)	FLOPs(G)	Download
FasterViT-0	82.1	95.9	5802	224x224	31.4	3.3	model
FasterViT-1	83.2	96.5	4188	224x224	53.4	5.3	model
FasterViT-2	84.2	96.8	3161	224x224	75.9	8.7	model
FasterViT-3	84.9	97.2	1780	224x224	159.5	18.2	model
FasterViT-4	85.4	97.3	849	224x224	424.6	36.6	model
FasterViT-5	85.6	97.4	449	224x224	975.5	113.0	model
FasterViT-6	85.8	97.4	352	224x224	1360.0	142.0	model

ImageNet-A - ImageNet-R - ImageNet-V2

All models use crop_pct=0.875. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.

Name	A-Acc@1(%)	A-Acc@5(%)	R-Acc@1(%)	R-Acc@5(%)	V2-Acc@1(%)	V2-Acc@5(%)
FasterViT-0	23.9	57.6	45.9	60.4	70.9	90.0
FasterViT-1	31.2	63.3	47.5	61.9	72.6	91.0
FasterViT-2	38.2	68.9	49.6	63.4	73.7	91.6
FasterViT-3	44.2	73.0	51.9	65.6	75.0	92.2
FasterViT-4	49.0	75.4	56.0	69.6	75.7	92.7
FasterViT-5	52.7	77.6	56.9	70.0	76.0	93.0
FasterViT-6	53.7	78.4	57.1	70.1	76.1	93.0

A, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively.

Training

Please see TRAINING.md for detailed training instructions of all models.

Evaluation

The FasterViT models can be evaluated on ImageNet-1K validation set using the following:

python validate.py \
--model <model-name>
--checkpoint <checkpoint-path>
--data_dir <imagenet-path>
--batch-size <batch-size-per-gpu

Here --model is the FasterViT variant (e.g. faster_vit_0_224_1k), --checkpoint is the path to pretrained model weights, --data_dir is the path to ImageNet-1K validation set and --batch-size is the number of batch size. We also provide a sample script here.

Installation

The dependencies can be installed by running:

pip install -r requirements.txt

Data Preparation

Please download the ImageNet dataset from its official website. The training and validation images need to have sub-folders for each class with the following structure:

  imagenet
  ├── train
  │   ├── class1
  │   │   ├── img1.jpeg
  │   │   ├── img2.jpeg
  │   │   └── ...
  │   ├── class2
  │   │   ├── img3.jpeg
  │   │   └── ...
  │   └── ...
  └── val
      ├── class1
      │   ├── img4.jpeg
      │   ├── img5.jpeg
      │   └── ...
      ├── class2
      │   ├── img6.jpeg
      │   └── ...
      └── ...

Acknowledgement

This repository is built on top of the timm repository. We thank Ross Wrightman for creating and maintaining this high-quality library.

Licenses

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

For license information regarding the timm repository, please refer to its repository.

For license information regarding the ImageNet dataset, please see the ImageNet official website.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
configs		configs
models		models
scheduler		scheduler
utils		utils
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TRAINING.md		TRAINING.md
onnx_convert.py		onnx_convert.py
onnx_test.py		onnx_test.py
requirements.txt		requirements.txt
tensorboard.py		tensorboard.py
train.py		train.py
train.sh		train.sh
validate.py		validate.py
validate.sh		validate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FasterViT: Fast Vision Transformers with Hierarchical Attention

💥 News 💥

Catalog

Results + Pretrained Models

ImageNet-1K

ImageNet-A - ImageNet-R - ImageNet-V2

Training

Evaluation

Installation

Data Preparation

Acknowledgement

Licenses

About

Releases

Packages

Languages

License

AUST-Hansen/FasterViT

Folders and files

Latest commit

History

Repository files navigation

FasterViT: Fast Vision Transformers with Hierarchical Attention

💥 News 💥

Catalog

Results + Pretrained Models

ImageNet-1K

ImageNet-A - ImageNet-R - ImageNet-V2

Training

Evaluation

Installation

Data Preparation

Acknowledgement

Licenses

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages