Implementation of Vision Transformer in PyTorch, a new model to achieve SOTA in vision classification with using transformer style encoders. Associated blog article.
Current Support for:
- Vanilla ViT
- Hybrid ViT (with support for BiTResNets as backbone)
- Hybrid ViT (with support for AxialResNets as backbone)
To Do:
- Training Script
- Full Axial-ViT
@inproceedings{
anonymous2021an,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Anonymous},
booktitle={Submitted to International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=YicbFdNTTy},
note={under review}
}