Content changes in exercises

jiehaohuang · Sep 2, 2015 · 56351f1 · 56351f1
1 parent 1748414
commit 56351f1
Showing 1 changed file with 12 additions and 3 deletions.
diff --git a/nn-from-scratch.ipynb b/nn-from-scratch.ipynb
@@ -584,7 +584,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see that while a hidden layer of low dimensionality seems to nicely capture the general trend of our data while higher dimensionalities are becoming more prone to overfitting. They are \"memorizing\" the data as opposed to finding a general trend. We could counteract this with stronger regularization, but picking the a correct size for hidden layer is the mist \"economical\" solution."
+    "We can see that while a hidden layer of low dimensionality seems to nicely capture the general trend of our data while higher dimensionalities are  more prone to overfitting. They are \"memorizing\" the data as opposed to fitting the general shape. We could counteract this with stronger regularization, but picking the a correct size for hidden layer is a much more \"economical\" solution."
    ]
   },
   {
@@ -597,12 +597,21 @@
     "\n",
     "Here are some things you can try out to become more familiar with the code:\n",
     "\n",
-    "1. Instead of using batch gradient descent use minibatch gradient descent ([more info](http://cs231n.github.io/optimization-1/#gd)) to train the network. Minibatch gradient descent typically performs well in practice. \n",
-    "2. Implement an annealing schedule for the gradient descent learning rate ([more info](http://cs231n.github.io/neural-networks-3/#anneal)). \n",
+    "1. Instead of using batch gradient descent use minibatch gradient descent ([more info](http://cs231n.github.io/optimization-1/#gd)) to train the network. Minibatch gradient descent typically performs much better in practice. \n",
+    "2. We used a fixed learning rate $\\epsilon$ for gradient descent. Implement an annealing schedule for the gradient descent learning rate ([more info](http://cs231n.github.io/neural-networks-3/#anneal)). \n",
     "3. We used a $\\tanh$ activation function for our hidden layer. Experiment with other activation functions (some are mentioned above). Note that changing the activation function also means changing the backpropagation derivative.\n",
     "4. Extend the network above to three classes instead of two. You will also need to generate an appropriate dataset for this.\n",
     "5. Extend the network to four layers. Experiment with the layer size. Adding another hidden layer means you will need to adjust both the forward propagation as well as the backpropagation code.\n"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {