This directory contains several notebooks that illustrate how to use Google's ViT both for fine-tuning on custom data as well as inference. It currently includes the following notebooks:
- performing inference with ViT to illustrate image classification
- fine-tuning ViT on CIFAR-10 using HuggingFace's Trainer
- fine-tuning ViT on CIFAR-10 using PyTorch Lightning
There's also the official HuggingFace image classification notebook, which can be found here.
Note that these notebooks work for any vision model in the library (i.e. any model supported by the AutoModelForImageClassification
API). You can just replace the checkpoint name
(like google/vit-base-patch16-224
) by another one (like facebook/convnext-tiny-224
)
Just pick your favorite vision model from the hub and start fine-tuning it :)
Below, I list some great blog posts explaining how to use ViT:
PyTorch:
- Fine-Tune ViT for Image Classification with 🤗 Transformers
- A complete Hugging Face tutorial: how to build and train a vision transformer
- How to Train the Hugging Face Vision Transformer On a Custom Dataset
Tensorflow/Keras: