Skip to content

Commit

Permalink
Fix markdown cells in chapters 12,16 (fastai#354)
Browse files Browse the repository at this point in the history
* Fix grammar

* Fix typos/grammar

* Fix markdown cells

* Fix markdown

Co-authored-by: Jeremy Howard <[email protected]>
  • Loading branch information
joe-bender and jph00 authored Jan 17, 2021
1 parent b138512 commit a60fc09
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 12 deletions.
8 changes: 4 additions & 4 deletions 12_nlp_dive.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1557,7 +1557,7 @@
"\n",
"Consider, for example, the sentences \"Henry has a dog and he likes his dog very much\" and \"Sophie has a dog and she likes her dog very much.\" It's very clear that the RNN needs to remember the name at the beginning of the sentence to be able to predict *he/she* or *his/her*. \n",
"\n",
"In practice, RNNs are really bad at retaining memory of what happened much earlier in the sentence, which is the motivation to have another hidden state (called *cell state*) in the LSTM. The cell state will be responsible for keeping *long short-term memory*, while the hidden state will focus on the next token to predict. Let's take a closer look and how this is achieved and build an LSTM from scratch."
"In practice, RNNs are really bad at retaining memory of what happened much earlier in the sentence, which is the motivation to have another hidden state (called *cell state*) in the LSTM. The cell state will be responsible for keeping *long short-term memory*, while the hidden state will focus on the next token to predict. Let's take a closer look at how this is achieved and build an LSTM from scratch."
]
},
{
Expand Down Expand Up @@ -2014,7 +2014,7 @@
"loss += beta * (activations[:,1:] - activations[:,:-1]).pow(2).mean()\n",
"```\n",
"\n",
"`alpha` and `beta` are then two hyperparameters to tune. To make this work, we need our model with dropout to return three things: the proper output, the activations of the LSTM pre-dropout, and the activations of the LSTM post-dropout. AR is often applied on the dropped-out activations (to not penalize the activations we turned in zeros afterward) while TAR is applied on the non-dropped-out activations (because those zeros create big differences between two consecutive time steps). There is then a callback called `RNNRegularizer` that will apply this regularization for us."
"`alpha` and `beta` are then two hyperparameters to tune. To make this work, we need our model with dropout to return three things: the proper output, the activations of the LSTM pre-dropout, and the activations of the LSTM post-dropout. AR is often applied on the dropped-out activations (to not penalize the activations we turned into zeros afterward) while TAR is applied on the non-dropped-out activations (because those zeros create big differences between two consecutive time steps). There is then a callback called `RNNRegularizer` that will apply this regularization for us."
]
},
{
Expand All @@ -2034,7 +2034,7 @@
"\n",
" self.h_o.weight = self.i_h.weight\n",
"\n",
"In `LMMModel7`, we include these final tweaks:"
"In `LMModel7`, we include these final tweaks:"
]
},
{
Expand Down Expand Up @@ -2338,7 +2338,7 @@
"source": [
"1. In ` LMModel2`, why can `forward` start with `h=0`? Why don't we need to say `h=torch.zeros(...)`?\n",
"1. Write the code for an LSTM from scratch (you may refer to <<lstm>>).\n",
"1. Search the internet for the GRU architecture and implement it from scratch, and try training a model. See if you can get results similar to those we saw in this chapter. Compare you results to the results of PyTorch's built in `GRU` module.\n",
"1. Search the internet for the GRU architecture and implement it from scratch, and try training a model. See if you can get results similar to those we saw in this chapter. Compare your results to the results of PyTorch's built in `GRU` module.\n",
"1. Take a look at the source code for AWD-LSTM in fastai, and try to map each of the lines of code to the concepts shown in this chapter."
]
},
Expand Down
16 changes: 8 additions & 8 deletions 16_accel_sgd.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You now know how to create state-of-the-art architectures for computer vision, natural image processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we're done, right? Not quite yet. We still have to explore a little bit more the training process.\n",
"You now know how to create state-of-the-art architectures for computer vision, natural language processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we're done, right? Not quite yet. We still have to explore a little bit more the training process.\n",
"\n",
"We explained in <<chapter_mnist_basics>> the basis of stochastic gradient descent: pass a mini-batch to the model, compare it to our target with the loss function, then compute the gradients of this loss function with regard to each weight before updating the weights with the formula:\n",
"\n",
Expand Down Expand Up @@ -1011,7 +1011,7 @@
"\n",
"The calls of the form `self('...')` are where the callbacks are called. As you see, this happens after every step. The callback will receive the entire state of training, and can also modify it. For instance, the input data and target labels are in `self.xb` and `self.yb`, respectively; a callback can modify these to alter the data the training loop sees. It can also modify `self.loss`, or even the gradients.\n",
"\n",
"Let's see how this work in practice by writing a callback."
"Let's see how this works in practice by writing a callback."
]
},
{
Expand All @@ -1035,14 +1035,14 @@
"- `after_loss`:: called after the loss has been computed, but before the backward pass. It can be used to add penalty to the loss (AR or TAR in RNN training, for instance).\n",
"- `after_backward`:: called after the backward pass, but before the update of the parameters. It can be used to make changes to the gradients before said update (via gradient clipping, for instance).\n",
"- `after_step`:: called after the step and before the gradients are zeroed.\n",
"- `after_batch`:: called at the end of a batch, for to perform any required cleanup before the next one.\n",
"- `after_batch`:: called at the end of a batch, to perform any required cleanup before the next one.\n",
"- `after_train`:: called at the end of the training phase of an epoch.\n",
"- `begin_validate`:: called at the beginning of the validation phase of an epoch; useful for any setup needed specifically for validation.\n",
"- `after_validate`:: called at the end of the validation part of an epoch.\n",
"- `after_epoch`:: called at the end of an epoch, for any cleanup before the next one.\n",
"- `after_fit`:: called at the end of training, for final cleanup.\n",
"\n",
"This elements of that list are available as attributes of the special variable `event`, so you can just type `event.` and hit Tab in your notebook to see a list of all the options"
"The elements of this list are available as attributes of the special variable `event`, so you can just type `event.` and hit Tab in your notebook to see a list of all the options."
]
},
{
Expand Down Expand Up @@ -1221,7 +1221,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this chapter we took a close look at the training loop, exploring different variants of SGD and why they can be more powerful. At the time of writing developing new optimizers is a very active area of research, so by the time you read this chapter there may be an addendum on the book's website that presents new variants. Be sure to check out how our general optimizer framework can help you implement new optimizers very quickly.\n",
"In this chapter we took a close look at the training loop, exploring different variants of SGD and why they can be more powerful. At the time of writing, developing new optimizers is a very active area of research, so by the time you read this chapter there may be an addendum on the book's website that presents new variants. Be sure to check out how our general optimizer framework can help you implement new optimizers very quickly.\n",
"\n",
"We also examined the powerful callback system that allows you to customize every bit of the training loop by enabling you to inspect and modify any parameter you like between each step."
]
Expand Down Expand Up @@ -1261,7 +1261,7 @@
"1. How can you get the list of events available to you when writing a callback?\n",
"1. Write the `ModelResetter` callback (without peeking).\n",
"1. How can you access the necessary attributes of the training loop inside a callback? When can you use or not use the shortcuts that go with them?\n",
"1. How can a callback influence the control flow of the training loop.\n",
"1. How can a callback influence the control flow of the training loop?\n",
"1. Write the `TerminateOnNaN` callback (without peeking, if possible).\n",
"1. How do you make sure your callback runs after or before another callback?"
]
Expand All @@ -1279,7 +1279,7 @@
"source": [
"1. Look up the \"Rectified Adam\" paper, implement it using the general optimizer framework, and try it out. Search for other recent optimizers that work well in practice, and pick one to implement.\n",
"1. Look at the mixed-precision callback with the documentation. Try to understand what each event and line of code does.\n",
"1. Implement your own version of ther learning rate finder from scratch. Compare it with fastai's version.\n",
"1. Implement your own version of the learning rate finder from scratch. Compare it with fastai's version.\n",
"1. Look at the source code of the callbacks that ship with fastai. See if you can find one that's similar to what you're looking to do, to get some inspiration."
]
},
Expand Down Expand Up @@ -1321,4 +1321,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}

0 comments on commit a60fc09

Please sign in to comment.