ViT This is an attempt to implement An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale in PyTorch Features Current Support for: Vanilla ViT Hybrid ViT (with support for BiT-style resnets) To Do: Axial ViT Training Script