Skip to content

Commit

Permalink
Merge pull request mbadry1#167 from dotslash21/master
Browse files Browse the repository at this point in the history
 Fixed minor formatting issues
  • Loading branch information
mbadry1 authored Aug 3, 2019
2 parents cdcadc0 + e02a3cb commit fe6dafe
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
8 changes: 4 additions & 4 deletions 1- Neural Networks and Deep Learning/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,13 +227,13 @@ Here are the course summary as its given on the course [link](https://www.course
- Lets say we have these variables:

```
X1 Feature
X1 Feature
X2 Feature
W1 Weight of the first feature.
W2 Weight of the second feature.
B Logistic Regression parameter.
M Number of training examples
Y(i) Expected output of i
Y(i) Expected output of i
```

- So we have:
Expand All @@ -246,7 +246,7 @@ Here are the course summary as its given on the course [link](https://www.course
d(z) = d(l)/d(z) = a - y
d(W1) = X1 * d(z)
d(W2) = X2 * d(z)
d(B) = d(z)
d(B) = d(z)
```

- From the above we can conclude the logistic regression pseudo code:
Expand Down Expand Up @@ -472,7 +472,7 @@ Here are the course summary as its given on the course [link](https://www.course
- Derivation of Sigmoid activation function:

```
g(z) = 1 / (1 + np.exp(-z))
g(z) = 1 / (1 + np.exp(-z))
g'(z) = (1 / (1 + np.exp(-z))) * (1 - (1 / (1 + np.exp(-z))))
g'(z) = g(z) * (1 - g(z))
```
Expand Down
4 changes: 2 additions & 2 deletions 2- Improving Deep Neural Networks/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ _**Implementation tip**_: if you implement gradient descent, one of the steps to
```
np.random.rand(shape) * np.sqrt(2/n[l-1])
```
- Number 1 or 2 in the nominator can also be a hyperparameter to tune (but not the first to start with)
- Number 1 or 2 in the neumerator can also be a hyperparameter to tune (but not the first to start with)
- This is one of the best way of partially solution to Vanishing / Exploding gradients (ReLU + Weight Initialization with variance) which will help gradients not to vanish/explode too quickly
- The initialization in this video is called "He Initialization / Xavier Initialization" and has been published in 2015 paper.
Expand Down Expand Up @@ -605,7 +605,7 @@ Implications of L2-regularization on:
6. Learning rate decay.
7. Regularization lambda.
8. Activation functions.
9. Adam `beta1` & `beta2`.
9. Adam `beta1`, `beta2` & `epsilon`.
- Its hard to decide which hyperparameter is the most important in a problem. It depends a lot on your problem.
- One of the ways to tune is to sample a grid with `N` hyperparameter settings and then try all settings combinations on your problem.
- Try random values: don't use a grid.
Expand Down

0 comments on commit fe6dafe

Please sign in to comment.