VFormer

A modular PyTorch library for Vision Transformers

Library Features

Contains implementations of prominent ViT architectures broken down into modular components like encoder, attention mechanism, and decoder.
Makes it easy to develop custom models by composing components of different architectures.

Installation

git clone https://github.com/SforAiDl/vformer.git
cd vformer/
python setup.py install

Models supported

Example usage

To instantiate and use a Swin Transformer model -

import torch
from vformer.models.classification import SwinTransformer

image = torch.randn(1, 3, 224, 224)       # Example data
model = SwinTransformer(
        img_size=224,
        patch_size=4,
        in_channels=3,
        n_classes=10,
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        drop_rate=0.2,
    )
logits = model(image)

VFormer has a modular design and allows for easy experimentation using blocks/modules of different architectures. For example, if desired, you can use just the encoder or the windowed attention layer of the Swin Transformer model.

from vformer.attention import WindowAttention

window_attn = WindowAttention(
        dim=128,
        window_size=7,
        num_heads=2,
        **kwargs,
    )

from vformer.encoder import SwinEncoder

swin_encoder = SwinEncoder(
        dim=128,
        input_resolution=(224, 224),
        depth=2,
        num_heads=2,
        window_size=7,
        **kwargs,
    )

References

Citations (click to expand)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

@inproceedings{chen2021crossvit,
    title={{CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification}},
    author={Chun-Fu (Richard) Chen and Quanfu Fan and Rameswar Panda},
    booktitle={International Conference on Computer Vision (ICCV)},
    year={2021}
}

Escaping the Big Data Paradigm with Compact Transformers

@article{hassani2021escaping,
	title        = {Escaping the Big Data Paradigm with Compact Transformers},
	author       = {Ali Hassani and Steven Walton and Nikhil Shah and Abulikemu Abuduweili and Jiachen Li and Humphrey Shi},
	year         = 2021,
	url          = {https://arxiv.org/abs/2104.05704},
	eprint       = {2104.05704},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
docs		docs
tests		tests
tools		tools
vformer		vformer
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
generate_dir_structure.py		generate_dir_structure.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VFormer

A modular PyTorch library for Vision Transformers

Library Features

Installation

Models supported

Example usage

References

About

Releases

Packages

Languages

License

rmhoward/vformer

Folders and files

Latest commit

History

Repository files navigation

VFormer

A modular PyTorch library for Vision Transformers

Library Features

Installation

Models supported

Example usage

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages