Skip to content

Commit

Permalink
Copyedits 10-12
Browse files Browse the repository at this point in the history
  • Loading branch information
sgugger committed May 15, 2020
1 parent a482265 commit a359960
Show file tree
Hide file tree
Showing 6 changed files with 396 additions and 393 deletions.
288 changes: 143 additions & 145 deletions 10_nlp.ipynb

Large diffs are not rendered by default.

138 changes: 69 additions & 69 deletions 11_midlevel_data.ipynb

Large diffs are not rendered by default.

247 changes: 123 additions & 124 deletions 12_nlp_dive.ipynb

Large diffs are not rendered by default.

46 changes: 26 additions & 20 deletions clean/10_nlp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -412,14 +412,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Putting Our Texts Into Batches for a Language Model"
"### Putting Our Texts into Batches for a Language Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide_input": true
"hide_input": false
},
"outputs": [
{
Expand Down Expand Up @@ -541,7 +541,6 @@
}
],
"source": [
"#hide\n",
"stream = \"In this chapter, we will go back over the example of classifying movie reviews we studied in chapter 1 and dig deeper under the surface. First we will look at the processing steps necessary to convert text into numbers and how to customize it. By doing this, we'll have another example of the PreProcessor used in the data block API.\\nThen we will study how we build a language model and train it for a while.\"\n",
"tokens = tkn(stream)\n",
"bs,seq_len = 6,15\n",
Expand Down Expand Up @@ -919,7 +918,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fine Tuning the Language Model"
"### Fine-Tuning the Language Model"
]
},
{
Expand Down Expand Up @@ -1305,7 +1304,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fine Tuning the Classifier"
"### Fine-Tuning the Classifier"
]
},
{
Expand Down Expand Up @@ -1507,28 +1506,28 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"1. What is self-supervised learning?\n",
"1. What is a language model?\n",
"1. Why is a language model considered self-supervised learning?\n",
"1. What is \"self-supervised learning\"?\n",
"1. What is a \"language model\"?\n",
"1. Why is a language model considered self-supervised?\n",
"1. What are self-supervised models usually used for?\n",
"1. Why do we fine-tune language models?\n",
"1. What are the three steps to create a state-of-the-art text classifier?\n",
"1. How do the 50,000 unlabeled movie reviews help create a better text classifier for the IMDb dataset?\n",
"1. How do the 50,000 unlabeled movie reviews help us create a better text classifier for the IMDb dataset?\n",
"1. What are the three steps to prepare your data for a language model?\n",
"1. What is tokenization? Why do we need it?\n",
"1. What is \"tokenization\"? Why do we need it?\n",
"1. Name three different approaches to tokenization.\n",
"1. What is 'xxbos'?\n",
"1. List 4 rules that fastai applies to text during tokenization.\n",
"1. Why are repeated characters replaced with a token showing the number of repetitions, and the character that's repeated?\n",
"1. What is numericalization?\n",
"1. What is `xxbos`?\n",
"1. List four rules that fastai applies to text during tokenization.\n",
"1. Why are repeated characters replaced with a token showing the number of repetitions and the character that's repeated?\n",
"1. What is \"numericalization\"?\n",
"1. Why might there be words that are replaced with the \"unknown word\" token?\n",
"1. With a batch size of 64, the first row of the tensor representing the first batch contains the first 64 tokens for the dataset. What does the second row of that tensor contain? What does the first row of the second batch contain? (Careful—students often get this one wrong! Be sure to check your answer against the book website.)\n",
"1. With a batch size of 64, the first row of the tensor representing the first batch contains the first 64 tokens for the dataset. What does the second row of that tensor contain? What does the first row of the second batch contain? (Careful—students often get this one wrong! Be sure to check your answer on the book's website.)\n",
"1. Why do we need padding for text classification? Why don't we need it for language modeling?\n",
"1. What does an embedding matrix for NLP contain? What is its shape?\n",
"1. What is perplexity?\n",
"1. What is \"perplexity\"?\n",
"1. Why do we have to pass the vocabulary of the language model to the classifier data block?\n",
"1. What is gradual unfreezing?\n",
"1. Why is text generation always likely to be ahead of automatic identification of machine generated texts?"
"1. What is \"gradual unfreezing\"?\n",
"1. Why is text generation always likely to be ahead of automatic identification of machine-generated texts?"
]
},
{
Expand All @@ -1542,9 +1541,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"1. See what you can learn about language models and disinformation. What are the best language models today? Have a look at some of their outputs. Do you find them convincing? How could a bad actor best use this to create conflict and uncertainty?\n",
"1. Given the limitation that models are unlikely to be able to consistently recognise machine generated texts, what other approaches may be needed to handle large-scale disinformation campaigns that leveraged deep learning?"
"1. See what you can learn about language models and disinformation. What are the best language models today? Take a look at some of their outputs. Do you find them convincing? How could a bad actor best use such a model to create conflict and uncertainty?\n",
"1. Given the limitation that models are unlikely to be able to consistently recognize machine-generated texts, what other approaches may be needed to handle large-scale disinformation campaigns that leverage deep learning?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
28 changes: 14 additions & 14 deletions clean/11_midlevel_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Munging With fastai's mid-Level API"
"# Data Munging with fastai's Mid-Level API"
]
},
{
Expand Down Expand Up @@ -599,7 +599,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Applying the mid-Tier Data API: SiamesePair"
"## Applying the Mid-Level Data API: SiamesePair"
]
},
{
Expand Down Expand Up @@ -815,17 +815,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Why do we say that fastai has a layered API? What does it mean?\n",
"1. Why does a `Transform` have a decode method? What does it do?\n",
"1. Why does a `Transform` have a setup method? What does it do?\n",
"1. Why do we say that fastai has a \"layered\" API? What does it mean?\n",
"1. Why does a `Transform` have a `decode` method? What does it do?\n",
"1. Why does a `Transform` have a `setup` method? What does it do?\n",
"1. How does a `Transform` work when called on a tuple?\n",
"1. Which methods do you need to implement when writing your own `Transform`?\n",
"1. Write a `Normalize` transform that fully normalizes items (substract the mean and divide by the standard deviation of the dataset), and that can decode that behavior. Try not to peak!\n",
"1. Write a `Transform` that does the numericalization of tokenized texts (it should set its vocab automatically from the dataset seen and have a decode method). Look at the source code of fastai if you need help.\n",
"1. Write a `Normalize` transform that fully normalizes items (subtract the mean and divide by the standard deviation of the dataset), and that can decode that behavior. Try not to peek!\n",
"1. Write a `Transform` that does the numericalization of tokenized texts (it should set its vocab automatically from the dataset seen and have a `decode` method). Look at the source code of fastai if you need help.\n",
"1. What is a `Pipeline`?\n",
"1. What is a `TfmdLists`? \n",
"1. What is a `Datasets`? How is it different from `TfmdLists`?\n",
"1. Why are `TfmdLists` and `Datasets` named with an s?\n",
"1. What is a `Datasets`? How is it different from a `TfmdLists`?\n",
"1. Why are `TfmdLists` and `Datasets` named with an \"s\"?\n",
"1. How can you build a `DataLoaders` from a `TfmdLists` or a `Datasets`?\n",
"1. How do you pass `item_tfms` and `batch_tfms` when building a `DataLoaders` from a `TfmdLists` or a `Datasets`?\n",
"1. What do you need to do when you want to have your custom items work with methods like `show_batch` or `show_results`?\n",
Expand All @@ -843,8 +843,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Use the mid-level API to grab the data on the pets dataset. On the adult dataset (used in chapter 1).\n",
"1. Look at the siamese tutorial in the fastai documentation to learn how to customize the behavior of `show_batch` and `show_results` for new type of items. Implement it on your own project."
"1. Use the mid-level API to prepare the data in `DataLoaders` on the pets dataset. On the adult dataset (used in chapter 1).\n",
"1. Look at the Siamese tutorial in the fastai documentation to learn how to customize the behavior of `show_batch` and `show_results` for new type of items. Implement it in your own project."
]
},
{
Expand All @@ -858,11 +858,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulationsyou've completed all of the chapters in this book which cover the key practical parts of training and using deep learning! You know how to use all of fastai's built in applications, and how to customise them using the data blocks API and loss functions. You even know how to create a neural network from scratch, and train it! (And hopefully you now know some of the questions to ask to help make sure your creations help improve society too.)\n",
"Congratulationsyou've completed all of the chapters in this book that cover the key practical parts of training models and using deep learning! You know how to use all of fastai's built-in applications, and how to customize them using the data block API and loss functions. You even know how to create a neural network from scratch, and train it! (And hopefully you now know some of the questions to ask to make sure your creations help improve society too.)\n",
"\n",
"The knowledge you already have is enough to create full working prototypes of many types of neural network application. More importantly, it will help you understand the capabilities and limitations of deep learning models, and how to design a system which best handles these capabilities and limitations.\n",
"The knowledge you already have is enough to create full working prototypes of many types of neural network application. More importantly, it will help you understand the capabilities and limitations of deep learning models, and how to design a system that's well adapted to them.\n",
"\n",
"In the rest of this book we will be pulling apart these applications, piece by piece, to understand all of the foundations they are built on. This is important knowledge for a deep learning practitioner, because it is the knowledge which allows you to inspect and debug models that you build, and to create new applications which are customised for your particular projects."
"In the rest of this book we will be pulling apart those applications, piece by piece, to understand the foundations they are built on. This is important knowledge for a deep learning practitioner, because it is what allows you to inspect and debug models that you build and create new applications that are customized for your particular projects."
]
},
{
Expand Down
42 changes: 21 additions & 21 deletions clean/12_nlp_dive.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -860,7 +860,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Model"
"### The Model"
]
},
{
Expand Down Expand Up @@ -1086,7 +1086,7 @@
"\n",
" def forward(self, input, state):\n",
" h,c = state\n",
" #One big multiplication for all the gates is better than 4 smaller ones\n",
" # One big multiplication for all the gates is better than 4 smaller ones\n",
" gates = (self.ih(input) + self.hh(h)).chunk(4, 1)\n",
" ingate,forgetgate,outgate = map(torch.sigmoid, gates[:3])\n",
" cellgate = gates[3].tanh()\n",
Expand Down Expand Up @@ -1339,7 +1339,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### AR and TAR Regularization"
"### Activation Regularization and Temporal Activation Regularization"
]
},
{
Expand Down Expand Up @@ -1554,16 +1554,16 @@
"source": [
"1. If the dataset for your project is so big and complicated that working with it takes a significant amount of time, what should you do?\n",
"1. Why do we concatenate the documents in our dataset before creating a language model?\n",
"1. To use a standard fully connected network to predict the fourth word given the previous three words, what two tweaks do we need to make?\n",
"1. To use a standard fully connected network to predict the fourth word given the previous three words, what two tweaks do we need to make to ou model?\n",
"1. How can we share a weight matrix across multiple layers in PyTorch?\n",
"1. Write a module which predicts the third word given the previous two words of a sentence, without peeking.\n",
"1. Write a module that predicts the third word given the previous two words of a sentence, without peeking.\n",
"1. What is a recurrent neural network?\n",
"1. What is hidden state?\n",
"1. What is \"hidden state\"?\n",
"1. What is the equivalent of hidden state in ` LMModel1`?\n",
"1. To maintain the state in an RNN why is it important to pass the text to the model in order?\n",
"1. What is an unrolled representation of an RNN?\n",
"1. To maintain the state in an RNN, why is it important to pass the text to the model in order?\n",
"1. What is an \"unrolled\" representation of an RNN?\n",
"1. Why can maintaining the hidden state in an RNN lead to memory and performance problems? How do we fix this problem?\n",
"1. What is BPTT?\n",
"1. What is \"BPTT\"?\n",
"1. Write code to print out the first few batches of the validation set, including converting the token IDs back into English strings, as we showed for batches of IMDb data in <<chapter_nlp>>.\n",
"1. What does the `ModelReseter` callback do? Why do we need it?\n",
"1. What are the downsides of predicting just one output word for each three input words?\n",
Expand All @@ -1573,23 +1573,23 @@
"1. Draw a representation of a stacked (multilayer) RNN.\n",
"1. Why should we get better results in an RNN if we call `detach` less often? Why might this not happen in practice with a simple RNN?\n",
"1. Why can a deep network result in very large or very small activations? Why does this matter?\n",
"1. In a computer's floating point representation of numbers, which numbers are the most precise?\n",
"1. In a computer's floating-point representation of numbers, which numbers are the most precise?\n",
"1. Why do vanishing gradients prevent training?\n",
"1. Why does it help to have two hidden states in the LSTM architecture? What is the purpose of each one?\n",
"1. What are these two states called in an LSTM?\n",
"1. What is tanh, and how is it related to sigmoid?\n",
"1. What is the purpose of this code in `LSTMCell`?: `h = torch.stack([h, input], dim=1)`\n",
"1. What does `chunk` to in PyTorch?\n",
"1. What is the purpose of this code in `LSTMCell`: `h = torch.stack([h, input], dim=1)`\n",
"1. What does `chunk` do in PyTorch?\n",
"1. Study the refactored version of `LSTMCell` carefully to ensure you understand how and why it does the same thing as the non-refactored version.\n",
"1. Why can we use a higher learning rate for `LMModel6`?\n",
"1. What are the three regularisation techniques used in an AWD-LSTM model?\n",
"1. What is dropout?\n",
"1. What are the three regularization techniques used in an AWD-LSTM model?\n",
"1. What is \"dropout\"?\n",
"1. Why do we scale the weights with dropout? Is this applied during training, inference, or both?\n",
"1. What is the purpose of this line from `Dropout`?: `if not self.training: return x`\n",
"1. What is the purpose of this line from `Dropout`: `if not self.training: return x`\n",
"1. Experiment with `bernoulli_` to understand how it works.\n",
"1. How do you set your model in training mode in PyTorch? In evaluation mode?\n",
"1. Write the equation for activation regularization (in maths or code, as you prefer). How is it different to weight decay?\n",
"1. Write the equation for temporal activation regularization (in maths or code, as you prefer). Why wouldn't we use this for computer vision problems?\n",
"1. Write the equation for activation regularization (in math or code, as you prefer). How is it different from weight decay?\n",
"1. Write the equation for temporal activation regularization (in math or code, as you prefer). Why wouldn't we use this for computer vision problems?\n",
"1. What is \"weight tying\" in a language model?"
]
},
Expand All @@ -1604,10 +1604,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"1. In ` LMModel2` why can `forward` start with `h=0`? Why don't we need to say `h=torch.zeros()`?\n",
"1. Write the code for an LSTM from scratch (but you may refer to <<lstm>>).\n",
"1. Search on the Internet for the GRU architecture and implement it from scratch, and try training a model. See if you can get the similar results as we saw in this chapter. Compare it to the results of PyTorch's built in GRU module.\n",
"1. Have a look at the source code for AWD-LSTM in fastai, and try to map each of the lines of code to the concepts shown in this chapter."
"1. In ` LMModel2`, why can `forward` start with `h=0`? Why don't we need to say `h=torch.zeros(...)`?\n",
"1. Write the code for an LSTM from scratch (you may refer to <<lstm>>).\n",
"1. Search the internet for the GRU architecture and implement it from scratch, and try training a model. See if you can get results similar to those we saw in this chapter. Compare you results to the results of PyTorch's built in `GRU` module.\n",
"1. Take a look at the source code for AWD-LSTM in fastai, and try to map each of the lines of code to the concepts shown in this chapter."
]
},
{
Expand Down

0 comments on commit a359960

Please sign in to comment.