Skip to content

Commit

Permalink
Made correction in Normalizing activation
Browse files Browse the repository at this point in the history
The activations of layer `l` affect the training of weights and bias of next layer `l+1`.  See this screenshot of the corresponding lecture: https://imgur.com/MG2OVt9
  • Loading branch information
Kaushal28 authored Jul 31, 2019
1 parent fa68c0a commit 9a778c8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion 2- Improving Deep Neural Networks/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -651,7 +651,7 @@ Implications of L2-regularization on:
- In the rise of deep learning, one of the most important ideas has been an algorithm called **batch normalization**, created by two researchers, Sergey Ioffe and Christian Szegedy.
- Batch Normalization speeds up learning.
- Before we normalized input by subtracting the mean and dividing by variance. This helped a lot for the shape of the cost function and for reaching the minimum point faster.
- The question is: *for any hidden layer can we normalize `A[l]` to train `W[l]`, `b[l]` faster?* This is what batch normalization is about.
- The question is: *for any hidden layer can we normalize `A[l]` to train `W[l+1]`, `b[l+1]` faster?* This is what batch normalization is about.
- There are some debates in the deep learning literature about whether you should normalize values before the activation function `Z[l]` or after applying the activation function `A[l]`. In practice, normalizing `Z[l]` is done much more often and that is what Andrew Ng presents.
- Algorithm:
- Given `Z[l] = [z(1), ..., z(m)]`, i = 1 to m (for each input)
Expand Down

0 comments on commit 9a778c8

Please sign in to comment.