ViT

This is an attempt to implement An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale in PyTorch

Features

Current Support for:

Vanilla ViT
Hybrid ViT (with support for BiT-style resnets)

To Do:

Axial ViT
Training Script