Made correction in Normalizing activation

The activations of layer `l` affect the training of weights and bias of next layer `l+1`. See this screenshot of the corresponding lecture: https://imgur.com/MG2OVt9
nnkhoa · Jul 31, 2019 · 9a778c8 · 9a778c8
1 parent fa68c0a
commit 9a778c8
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/2- Improving Deep Neural Networks/Readme.md b/2- Improving Deep Neural Networks/Readme.md
@@ -651,7 +651,7 @@ Implications of L2-regularization on:
 - In the rise of deep learning, one of the most important ideas has been an algorithm called **batch normalization**, created by two researchers, Sergey Ioffe and Christian Szegedy.
 - Batch Normalization speeds up learning.
 - Before we normalized input by subtracting the mean and dividing by variance. This helped a lot for the shape of the cost function and for reaching the minimum point faster.
-- The question is: *for any hidden layer can we normalize `A[l]` to train `W[l]`, `b[l]` faster?* This is what batch normalization is about.
+- The question is: *for any hidden layer can we normalize `A[l]` to train `W[l+1]`, `b[l+1]` faster?* This is what batch normalization is about.
 - There are some debates in the deep learning literature about whether you should normalize values before the activation function `Z[l]` or after applying the activation function `A[l]`. In practice, normalizing `Z[l]` is done much more often and that is what Andrew Ng presents.
 - Algorithm:
   - Given `Z[l] = [z(1), ..., z(m)]`, i = 1 to m (for each input)