This notebook is for assignment 1 of the CS-503 Visual Intelligence course at EPFL by Prof. Amir Zamir.
The goals of this assignment are to:
- Implement a Vision Transformer for MNIST classification
- Implement a GPT decoder model for image generation
Topics covered in this assignment:
- Self-attention
- Basic tokenization
- Basic positional encodings
- Transformer encoder-only (e.g. ViT) and decoder-only (e.g. GPT) models
- Vision Transformer (ViT)
- Supervised training
- Autoregressive modelling