Lesson 1: Introduction to attention and language models (1h)
- 1.1 A brief history of NLP (15 min)
- 1.2 Paying attention with attention (15 min)
- 1.3 Encoder-decoder architectures (15 min)
- 1.4 How language models look at text (15 min)
Lesson 2: How transformers use attention to process text (1h)
- 2.1 Introduction to transformers (10 min)
- 2.2 Scaled dot product attention (30 min)
- 2.3 Multi-headed attention (20 min)
Lesson 3: Transfer Learning (45m)
- 3.1 Introduction to Transfer Learning (15 min)
- 3.2 Introduction to Pytorch (15 min)
- 3.3 Fine-tuning transformers with Pytorch (15 min)
Lesson 4: Natural Language Understanding with BERT (1h)
- 4.1 Introduction to BERT (15 min)
- 4.2 Encoders need only apply: BERT’s architecture (15 min)
- 4.3 Wordpiece tokenization (15 min)
- 4.4 The many embeddings of BERT (15 min)
Lesson 5: Pre-training and fine-tuning BERT (45m)
- 5.1 The Masked Language Modeling Task (15 min)
- 5.2 The Next Sentence Prediction Task (15 min)
- 5.3 Fine-tuning BERT to solve NLP tasks (15 min)
Lesson 6: Hands on BERT (1h 15m)
- 6.1 Flavors of BERT (15 min)
- 6.2 BERT for sequence classification (20 min)
- 6.3 BERT for token classification (20 min)
- 6.4 BERT for question/answering (20 min)
Lesson 7: Natural Language Generation with GPT (1h 15m)
- 7.1 Introduction to the GPT family (10 min)
- 7.2 Decoders need only apply: GPT’s architecture (15 min)
- 7.3 Masked multi-headed attention (15 min)
- 7.4 Pre-training GPT (10 min)
- 7.5 Few-shot learning (10 min)
- 7.6 Multi-task learning (10 min)
Lesson 8: Hands on GPT (1h)
- 8.1 Off the shelf GPT results using few shot learning (20 min)
- 8.2 GPT for style completion (20 min)
- 8.3 GPT for code dictation (20 min)
Lesson 9: Further applications of BERT + GPT (1h)
- 9.1 Siamese BERT-networks for semantic searching (30 min)
- 9.2 Teaching GPT multiple tasks at once with prompt engineering (30 min)
Lesson 10: T5: back to basics (35m)
- 10.1 Encoders and decoders welcome: T5’s architecture (15min)
- 10.2 Cross-attention (20 min)
Lesson 11: Hands on T5 (50m)
- 11.1 Off the shelf results with T5 (20 min)
- 11.2 Using T5 for abstractive summarization (30 min)
Lesson 12: The vision transformer (1h)
- 12.1 Introduction to the Vision Transformer (ViT) (15min)
- 12.2 Combining ViT and GPT to caption images (15 min)
- 12.3 Fine-tuning an image captioning system (30 min)
Lesson 13: Deploying Transformer models (1h)
- 13.1 Introduction to MLOps (20 min)
- 13.2 Sharing our models on HuggingFace (15 min)
- 13.3 Deploying a fine-tuned BERT model using FastAPI (25 min)
Lesson 14: Using Massively Large Language Models (1h)
- 14.1 Modern Large Language Models (20 min)
- 14.2 GPT-3 + ChatGPT (15 min)
- 14.3 Other LLMs + Semantic Search with OpenAI Embeddings (25 min)