Skip to content

Commit

Permalink
Content changes in exercises
Browse files Browse the repository at this point in the history
  • Loading branch information
dennybritz committed Sep 2, 2015
1 parent 1748414 commit 56351f1
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions nn-from-scratch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that while a hidden layer of low dimensionality seems to nicely capture the general trend of our data while higher dimensionalities are becoming more prone to overfitting. They are \"memorizing\" the data as opposed to finding a general trend. We could counteract this with stronger regularization, but picking the a correct size for hidden layer is the mist \"economical\" solution."
"We can see that while a hidden layer of low dimensionality seems to nicely capture the general trend of our data while higher dimensionalities are more prone to overfitting. They are \"memorizing\" the data as opposed to fitting the general shape. We could counteract this with stronger regularization, but picking the a correct size for hidden layer is a much more \"economical\" solution."
]
},
{
Expand All @@ -597,12 +597,21 @@
"\n",
"Here are some things you can try out to become more familiar with the code:\n",
"\n",
"1. Instead of using batch gradient descent use minibatch gradient descent ([more info](http://cs231n.github.io/optimization-1/#gd)) to train the network. Minibatch gradient descent typically performs well in practice. \n",
"2. Implement an annealing schedule for the gradient descent learning rate ([more info](http://cs231n.github.io/neural-networks-3/#anneal)). \n",
"1. Instead of using batch gradient descent use minibatch gradient descent ([more info](http://cs231n.github.io/optimization-1/#gd)) to train the network. Minibatch gradient descent typically performs much better in practice. \n",
"2. We used a fixed learning rate $\\epsilon$ for gradient descent. Implement an annealing schedule for the gradient descent learning rate ([more info](http://cs231n.github.io/neural-networks-3/#anneal)). \n",
"3. We used a $\\tanh$ activation function for our hidden layer. Experiment with other activation functions (some are mentioned above). Note that changing the activation function also means changing the backpropagation derivative.\n",
"4. Extend the network above to three classes instead of two. You will also need to generate an appropriate dataset for this.\n",
"5. Extend the network to four layers. Experiment with the layer size. Adding another hidden layer means you will need to adjust both the forward propagation as well as the backpropagation code.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit 56351f1

Please sign in to comment.