🌟 Welcome, LLM Trailblazers! Let's Build Your LLMs Ground Up 🌟

This repository is dedicated to all Trailblazers embarking on the journey to build Large Language Models from the ground up and apply them to their projects. Here you will find a series of Jupyter notebooks that guide you through the process of building a Generative Pre-trained Transformer model from scratch.

🌳 Repository Structure

.
├── data/
├── helpers/
├── .gitignore
├─1_Setup.ipynb
├─2_Tokenization.ipynb
├─3_Attention.ipynb
├─4_GPT.ipynb
├─5_Training.ipynb
└── requirements.txt

🛠️ Notebooks and Flow

The notebooks are designed to be completed in order, each building on the concepts introduced in the previous ones:

Notebook	Description	Open in Colab
🏁 Setup	Introduction to the project, importing DistilGPT2 for a basic model.	Open In Colab
✂️ Tokenization	Overview of tokenization techniques and custom dataloader implementation.	Open In Colab
🧠 Attention	Deep dive into attention mechanisms, such as dot-product, scaled attention, and multi-head attention.	Open In Colab
🏗️ GPT Architecture	Build the core GPT model, including Multi-Head Attention, Layer Normalization, Feed-Forward Neural Network, and Residual Connections.	Open In Colab
🎓 Training	Train, evaluate, and experiment with hyperparameters for the GPT model.	Open In Colab

🎉 Get Started

Clone the repository and explore the notebooks to learn how to build and train your own LLMs!

Each notebook contains cells marked with TODO. These are points where you're encouraged to implement key components of the GPT architecture, helping to reinforce your understanding of how the model works.

To get the most out of this tutorial:

Clone the repository
Install the required dependencies (listed in requirements.txt)
Work through the notebooks in order, completing the TODO sections
Experiment with the code and hyperparameters to deepen your understanding

Prerequisites

Basic understanding of Python and PyTorch
Familiarity with neural network concepts
Jupyter Notebook environment

Acknowledgments

This tutorial is designed to make understanding GPT accessible to a wider audience. While some mathematical concepts have been simplified, the core principles of the GPT architecture are preserved.

Happy learning, and enjoy building your own GPT model!

Contributions

Contributions, issues, and feature requests are welcome! Feel free to check the issues page if you want to contribute.

License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Connect with Me

📚 Foundational Papers

To deepen your understanding of LLMs and related technologies, I can't but recommend exploring these foundational papers (each and every time finding something new!):

These papers provide valuable insights into the development, scaling, and optimization of large language models and related AI technologies.

It All Starts Here
- "Attention Is All You Need" (2017) - Introduces the Transformer architecture, the basis of GPT models.
- "Improving Language Understanding by Generative Pre-Training" (2018) - Describes the original GPT model and its benefits.
Evolution of GPT Models
- "Language Models are Unsupervised Multitask Learners" (2019) - Describes GPT-2.
- "Language Models are Few-Shot Learners" (2020) - Discusses GPT-3's capabilities.
- "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2019) - Explores BERT's impact on the NLP field.
Language Models and Scaling
- Scaling Laws for Neural Language Models
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Instruction Fine-Tuning and Feedback Loops
- Training Language Models to Follow Instructions with Human Feedback
- "InstructGPT: Training Language Models to Follow Instructions with Human Feedback" (2022).
- "LoRA: Low-Rank Adaptation of Large Language Models" (2021) - Introduces a novel approach to fine-tuning large language models efficiently by adding trainable low-rank matrices to the existing weights.
- "Training Compute-Optimal Large Language Models" (2022) - Provides new insights on scaling laws for language models, suggesting that increasing data size and reducing model size can lead to better performance for a given computational budget.
Retrieval of Dynamically Changing Knowledge
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Image Synthesis
- High-Resolution Image Synthesis with Latent Diffusion Models
System Optimizations
- DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Welcome, LLM Trailblazers! Let's Build Your LLMs Ground Up 🌟

🌳 Repository Structure

🛠️ Notebooks and Flow

🎉 Get Started

Prerequisites

Acknowledgments

Contributions

License

🔗 Connect with Me

📚 Foundational Papers

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
helpers		helpers
.gitignore		.gitignore
1_Setup.ipynb		1_Setup.ipynb
2_Tokenization.ipynb		2_Tokenization.ipynb
3_Attention.ipynb		3_Attention.ipynb
4_GPT.ipynb		4_GPT.ipynb
5_Training.ipynb		5_Training.ipynb
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

License

EliaLesyk/trailblazeGPT

Folders and files

Latest commit

History

Repository files navigation

🌟 Welcome, LLM Trailblazers! Let's Build Your LLMs Ground Up 🌟

🌳 Repository Structure

🛠️ Notebooks and Flow

🎉 Get Started

Prerequisites

Acknowledgments

Contributions

License

🔗 Connect with Me

📚 Foundational Papers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages