Add small explanation changes for hyperparameters and a change to bac…

…kpropagation variable order for clarity
13114848878 · Sep 5, 2017 · 1f978d8 · 1f978d8
1 parent cf86fcb
commit 1f978d8
Showing 1 changed file with 44 additions and 16 deletions.
diff --git a/first-neural-network/Your_first_neural_network.ipynb b/first-neural-network/Your_first_neural_network.ipynb
@@ -13,7 +13,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "%matplotlib inline\n",
@@ -36,7 +38,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "data_path = 'Bike-Sharing-Dataset/hour.csv'\n",
@@ -47,7 +51,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "rides.head()"
@@ -67,7 +73,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "rides[:24*10].plot(x='dteday', y='cnt')"
@@ -84,7 +92,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']\n",
@@ -111,7 +121,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']\n",
@@ -135,7 +147,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# Save data for approximately the last 21 days \n",
@@ -160,7 +174,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# Hold out the last 60 days or so of the remaining data as a validation set\n",
@@ -260,11 +276,13 @@
     "            # TODO: Output error - Replace this value with your calculations.\n",
     "            error = None # Output layer error is the difference between desired target and actual output.\n",
     "            \n",
+    "            # TODO: Calculate the backpropagated error term (delta) for the output \n",
+    "            output_error_term = None\n",
+    "            \n",
     "            # TODO: Calculate the hidden layer's contribution to the error\n",
     "            hidden_error = None\n",
     "            \n",
-    "            # TODO: Backpropagated error terms - Replace these values with your calculations.\n",
-    "            output_error_term = None\n",
+    "            # TODO: Calculate the backpropagated error term (delta) for the hidden layer\n",
     "            hidden_error_term = None\n",
     "\n",
     "            # Weight step (input to hidden)\n",
@@ -320,7 +338,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "import unittest\n",
@@ -394,19 +414,23 @@
     "You'll also be using a method know as Stochastic Gradient Descent (SGD) to train the network. The idea is that for each training pass, you grab a random sample of the data instead of using the whole data set. You use many more training passes than with normal gradient descent, but each pass is much faster. This ends up training the network more efficiently. You'll learn more about SGD later.\n",
     "\n",
     "### Choose the number of iterations\n",
-    "This is the number of batches of samples from the training data we'll use to train the network. The more iterations you use, the better the model will fit the data. However, if you use too many iterations, then the model with not generalize well to other data, this is called overfitting. You want to find a number here where the network has a low training loss, and the validation loss is at a minimum. As you start overfitting, you'll see the training loss continue to decrease while the validation loss starts to increase.\n",
+    "This is the number of batches of samples from the training data we'll use to train the network. The more iterations you use, the better the model will fit the data. However, this process can have sharply diminishing returns and can waste computational resources if you use too many iterations.  You want to find a number here where the network has a low training loss, and the validation loss is at a minimum. The ideal number of iterations would be a level that stops shortly after the validation loss is no longer decreasing.\n",
     "\n",
     "### Choose the learning rate\n",
     "This scales the size of weight updates. If this is too big, the weights tend to explode and the network fails to fit the data. Normally a good choice to start at is 0.1; however, if you effectively divide the learning rate by n_records, try starting out with a learning rate of 1. In either case, if the network has problems fitting the data, try reducing the learning rate. Note that the lower the learning rate, the smaller the steps are in the weight updates and the longer it takes for the neural network to converge.\n",
     "\n",
     "### Choose the number of hidden nodes\n",
-    "The more hidden nodes you have, the more accurate predictions the model will make. Try a few different numbers and see how it affects the performance. You can look at the losses dictionary for a metric of the network performance. If the number of hidden units is too low, then the model won't have enough space to learn and if it is too high there are too many options for the direction that the learning can take. The trick here is to find the right balance in number of hidden units you choose."
+    "In a model where all the weights are optimized, the more hidden nodes you have, the more accurate the predictions of the model will be.  (A fully optimized model could have weights of zero, after all.) However, the more hidden nodes you have, the harder it will be to optimize the weights of the model, and the more likely it will be that suboptimal weights will lead to overfitting. With overfitting, the model will memorize the training data instead of learning the true pattern, and won't generalize well to unseen data.  \n",
+    "\n",
+    "Try a few different numbers and see how it affects the performance. You can look at the losses dictionary for a metric of the network performance. If the number of hidden units is too low, then the model won't have enough space to learn and if it is too high there are too many options for the direction that the learning can take. The trick here is to find the right balance in number of hidden units you choose.  You'll generally find that the best number of hidden nodes to use ends up being between the number of input and output nodes."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "import sys\n",
@@ -443,7 +467,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "plt.plot(losses['train'], label='Training loss')\n",
@@ -464,7 +490,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "fig, ax = plt.subplots(figsize=(8,4))\n",