Skip to content

Commit

Permalink
add note about duplicated cell
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt committed Aug 20, 2024
1 parent 01cb137 commit 9f0bda7
Showing 1 changed file with 34 additions and 18 deletions.
52 changes: 34 additions & 18 deletions ch05/01_main-chapter-code/exercise-solutions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -70,19 +70,20 @@
"id": "5860ba9f-2db3-4480-b96b-4be1c68981eb",
"metadata": {},
"source": [
"We can print the number of times the word \"pizza\" is sampled using the `print_sampled_tokens` function we defined in this section. Let's start with the code we defined in section 5.3.1.\n",
"- We can print the number of times the word \"pizza\" is sampled using the `print_sampled_tokens` function we defined in this section\n",
"- Let's start with the code we defined in section 5.3.1\n",
"\n",
"It is sampled 0x if the temperature is 0 or 0.1, and it is sampled 32x if the temperature is scaled up to 5. The estimated probability is 32/1000 * 100% = 3.2%.\n",
"- It is sampled 0x if the temperature is 0 or 0.1, and it is sampled 32x if the temperature is scaled up to 5. The estimated probability is 32/1000 * 100% = 3.2%\n",
"\n",
"The actual probability is 4.3% and contained in the rescaled softmax probability tensor (`scaled_probas[2][6]`)."
"- The actual probability is 4.3% and contained in the rescaled softmax probability tensor (`scaled_probas[2][6]`)"
]
},
{
"cell_type": "markdown",
"id": "9cba59c2-a8a3-4af3-add4-70230795225e",
"metadata": {},
"source": [
"Below is a self-contained example using code from chapter 5:"
"- Below is a self-contained example using code from chapter 5:"
]
},
{
Expand Down Expand Up @@ -133,7 +134,7 @@
"id": "1ee0f9f3-4132-42c7-8324-252fd8f59145",
"metadata": {},
"source": [
"Now, we can iterate over the `scaled_probas` and print the sampling frequencies in each case:"
"- Now, we can iterate over the `scaled_probas` and print the sampling frequencies in each case:"
]
},
{
Expand Down Expand Up @@ -194,9 +195,11 @@
"id": "fbf88c97-19c4-462c-924a-411c8c765d2c",
"metadata": {},
"source": [
"Note that sampling offers an approximation of the actual probabilities when the word \"pizza\" is sampled. E.g., if it is sampled 32/1000 times, the estimated probability is 3.2%. To obtain the actual probability, we can check the probabilities directly by accessing the corresponding entry in `scaled_probas`.\n",
"- Note that sampling offers an approximation of the actual probabilities when the word \"pizza\" is sampled\n",
"- E.g., if it is sampled 32/1000 times, the estimated probability is 3.2%\n",
"- To obtain the actual probability, we can check the probabilities directly by accessing the corresponding entry in `scaled_probas`\n",
"\n",
"Since \"pizza\" is the 7th entry in the vocabulary, for the temperature of 5, we obtain it as follows:"
"- Since \"pizza\" is the 7th entry in the vocabulary, for the temperature of 5, we obtain it as follows:"
]
},
{
Expand Down Expand Up @@ -228,7 +231,7 @@
"id": "d3dcb438-5f18-4332-9627-66009f30a1a4",
"metadata": {},
"source": [
"There is a 4.3% probability that the word \"pizza\" is sampled if the temperature is set to 5."
"There is a 4.3% probability that the word \"pizza\" is sampled if the temperature is set to 5"
]
},
{
Expand Down Expand Up @@ -379,6 +382,14 @@
"print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
]
},
{
"cell_type": "markdown",
"id": "c85b1f11-37a5-477d-9c2d-170a6865e669",
"metadata": {},
"source": [
"- Note that re-executing the previous code cell will produce the exact same generated text:"
]
},
{
"cell_type": "code",
"execution_count": 9,
Expand Down Expand Up @@ -422,9 +433,10 @@
"id": "f40044e8-a0f5-476c-99fd-489b999fd80a",
"metadata": {},
"source": [
"If we are still in the Python session where you first trained the model in chapter 5, to continue the pretraining for one more epoch, we just have to load the model and optimizer that we saved in the main chapter and call the `train_model_simple` function again.\n",
"- If we are still in the Python session where you first trained the model in chapter 5, to continue the pretraining for one more epoch, we just have to load the model and optimizer that we saved in the main chapter and call the `train_model_simple` function again\n",
"\n",
"It takes a couple more steps to make this reproducible in this new code environment. First, we load the tokenizer, model, and optimizer:"
"- It takes a couple more steps to make this reproducible in this new code environment\n",
"- First, we load the tokenizer, model, and optimizer:"
]
},
{
Expand Down Expand Up @@ -468,7 +480,7 @@
"id": "688fce4a-9ab2-4d97-a95c-fef02c32b4f3",
"metadata": {},
"source": [
"Next, we initialize the data loader:"
"- Next, we initialize the data loader:"
]
},
{
Expand Down Expand Up @@ -531,7 +543,7 @@
"id": "76598ef8-165c-4bcc-af5e-b6fe72398365",
"metadata": {},
"source": [
"Lastly, we use the `train_model_simple` function to train the model:"
"- Lastly, we use the `train_model_simple` function to train the model:"
]
},
{
Expand Down Expand Up @@ -574,21 +586,22 @@
"id": "7cb1140b-2027-4156-8d19-600ac849edbe",
"metadata": {},
"source": [
"We can use the following code to calculate the training and validation set losses of the GPT model:\n",
"- We can use the following code to calculate the training and validation set losses of the GPT model:\n",
"\n",
"```python\n",
"train_loss = calc_loss_loader(train_loader, gpt, device)\n",
"val_loss = calc_loss_loader(val_loader, gpt, device)\n",
"```\n",
"\n",
"The resulting losses for the 124M parameter are as follows:\n",
"- The resulting losses for the 124M parameter are as follows:\n",
"\n",
"```\n",
"Training loss: 3.754748503367106\n",
"Validation loss: 3.559617757797241\n",
"```\n",
"\n",
"The main observation is that the training and validation set performances are in the same ballpark. This can have multiple explanations.\n",
"- The main observation is that the training and validation set performances are in the same ballpark\n",
"- This can have multiple explanations:\n",
"\n",
"1. The Verdict was not part of the pretraining dataset when OpenAI trained GPT-2. Hence, the model is not explicitly overfitting to the training set and performs similarly well on The Verdict's training and validation set portions. (The validation set loss is slightly lower than the training set loss, which is unusual in deep learning. However, it's likely due to random noise since the dataset is relatively small. In practice, if there is no overfitting, the training and validation set performances are expected to be roughly identical).\n",
"\n",
Expand Down Expand Up @@ -849,14 +862,17 @@
"id": "b3d313f4-0038-4bc9-a340-84b3b55dc0e3",
"metadata": {},
"source": [
"In the main chapter, we experimented with the smallest GPT-2 model, which has only 124M parameters. The reason was to keep the resource requirements as low as possible. However, you can easily experiment with larger models with minimal code changes. For example, instead of loading the 1558M instead of 124M model in chapter 5, the only 2 lines of code that we have to change are\n",
"- In the main chapter, we experimented with the smallest GPT-2 model, which has only 124M parameters\n",
"- The reason was to keep the resource requirements as low as possible\n",
"- However, you can easily experiment with larger models with minimal code changes\n",
"- For example, instead of loading the 1558M instead of 124M model in chapter 5, the only 2 lines of code that we have to change are\n",
"\n",
"```python\n",
"settings, params = download_and_load_gpt2(model_size=\"124M\", models_dir=\"gpt2\")\n",
"model_name = \"gpt2-small (124M)\"\n",
"```\n",
"\n",
"The updated code becomes\n",
"- The updated code becomes\n",
"\n",
"\n",
"```python\n",
Expand Down Expand Up @@ -992,7 +1008,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 9f0bda7

Please sign in to comment.