This repository is dedicated to all Trailblazers embarking on the journey to build Large Language Models from the ground up and apply them to their projects. Here you will find a series of Jupyter notebooks that guide you through the process of building a Generative Pre-trained Transformer model from scratch.
.
├── data/
├── helpers/
├── .gitignore
├─1_Setup.ipynb
├─2_Tokenization.ipynb
├─3_Attention.ipynb
├─4_GPT.ipynb
├─5_Training.ipynb
└── requirements.txt
The notebooks are designed to be completed in order, each building on the concepts introduced in the previous ones:
Notebook | Description | Open in Colab |
---|---|---|
🏁 Setup | Introduction to the project, importing DistilGPT2 for a basic model. | Open In Colab |
✂️ Tokenization | Overview of tokenization techniques and custom dataloader implementation. | Open In Colab |
🧠 Attention | Deep dive into attention mechanisms, such as dot-product, scaled attention, and multi-head attention. | Open In Colab |
🏗️ GPT Architecture | Build the core GPT model, including Multi-Head Attention, Layer Normalization, Feed-Forward Neural Network, and Residual Connections. | Open In Colab |
🎓 Training | Train, evaluate, and experiment with hyperparameters for the GPT model. | Open In Colab |
Clone the repository and explore the notebooks to learn how to build and train your own LLMs!
Each notebook contains cells marked with TODO
. These are points where you're encouraged to implement key components of the GPT architecture, helping to reinforce your understanding of how the model works.
To get the most out of this tutorial:
- Clone the repository
- Install the required dependencies (listed in
requirements.txt
) - Work through the notebooks in order, completing the
TODO
sections - Experiment with the code and hyperparameters to deepen your understanding
- Basic understanding of Python and PyTorch
- Familiarity with neural network concepts
- Jupyter Notebook environment
This tutorial is designed to make understanding GPT accessible to a wider audience. While some mathematical concepts have been simplified, the core principles of the GPT architecture are preserved.
Happy learning, and enjoy building your own GPT model!
Contributions, issues, and feature requests are welcome! Feel free to check the issues page if you want to contribute.
This project is licensed under the MIT License - see the LICENSE file for details.
To deepen your understanding of LLMs and related technologies, I can't but recommend exploring these foundational papers (each and every time finding something new!):
These papers provide valuable insights into the development, scaling, and optimization of large language models and related AI technologies.
-
It All Starts Here
- "Attention Is All You Need" (2017) - Introduces the Transformer architecture, the basis of GPT models.
- "Improving Language Understanding by Generative Pre-Training" (2018) - Describes the original GPT model and its benefits.
-
Evolution of GPT Models
- "Language Models are Unsupervised Multitask Learners" (2019) - Describes GPT-2.
- "Language Models are Few-Shot Learners" (2020) - Discusses GPT-3's capabilities.
- "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2019) - Explores BERT's impact on the NLP field.
-
Language Models and Scaling
-
Instruction Fine-Tuning and Feedback Loops
- Training Language Models to Follow Instructions with Human Feedback
- "InstructGPT: Training Language Models to Follow Instructions with Human Feedback" (2022).
- "LoRA: Low-Rank Adaptation of Large Language Models" (2021) - Introduces a novel approach to fine-tuning large language models efficiently by adding trainable low-rank matrices to the existing weights.
- "Training Compute-Optimal Large Language Models" (2022) - Provides new insights on scaling laws for language models, suggesting that increasing data size and reducing model size can lead to better performance for a given computational budget.
-
Retrieval of Dynamically Changing Knowledge
-
Image Synthesis
-
System Optimizations