This is an attempt to implement An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale in PyTorch
Current Support for:
- Vanilla ViT
- Hybrid ViT (with support for BiT-style resnets)
To Do:
- Axial ViT
- Training Script
This is an attempt to implement An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale in PyTorch
Current Support for:
To Do: