forked from udlbook/udlbook
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b7869e8
commit 05c7cb0
Showing
1 changed file
with
106 additions
and
97 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,20 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [], | ||
"include_colab_link": true | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "view-in-github", | ||
"colab_type": "text" | ||
"colab_type": "text", | ||
"id": "view-in-github" | ||
}, | ||
"source": [ | ||
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap07/7_3_Initialization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "L6chybAVFJW2" | ||
}, | ||
"source": [ | ||
"# **Notebook 7.3: Initialization**\n", | ||
"\n", | ||
|
@@ -35,10 +23,16 @@ | |
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n", | ||
"\n", | ||
"Contact me at [email protected] if you find any mistakes or have any suggestions." | ||
], | ||
"metadata": { | ||
"id": "L6chybAVFJW2" | ||
} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# check" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
|
@@ -54,15 +48,20 @@ | |
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"First let's define a neural network. We'll just choose the weights and biases randomly for now" | ||
], | ||
"metadata": { | ||
"id": "nnUoI0m6GyjC" | ||
} | ||
}, | ||
"source": [ | ||
"First let's define a neural network. We'll just choose the weights and biases randomly for now" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "WVM4Tc_jGI0Q" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def init_params(K, D, sigma_sq_omega):\n", | ||
" # Set seed so we always get the same random numbers\n", | ||
|
@@ -89,29 +88,29 @@ | |
" all_biases[layer] = np.zeros((D,1))\n", | ||
"\n", | ||
" return all_weights, all_biases" | ||
], | ||
"metadata": { | ||
"id": "WVM4Tc_jGI0Q" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "jZh-7bPXIDq4" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Define the Rectified Linear Unit (ReLU) function\n", | ||
"def ReLU(preactivation):\n", | ||
" activation = preactivation.clip(0.0)\n", | ||
" return activation" | ||
], | ||
"metadata": { | ||
"id": "jZh-7bPXIDq4" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "LgquJUJvJPaN" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def compute_network_output(net_input, all_weights, all_biases):\n", | ||
"\n", | ||
|
@@ -140,24 +139,24 @@ | |
" net_output = all_f[K]\n", | ||
"\n", | ||
" return net_output, all_f, all_h" | ||
], | ||
"metadata": { | ||
"id": "LgquJUJvJPaN" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Now let's investigate how the size of the outputs vary as we change the initialization variance:\n" | ||
], | ||
"metadata": { | ||
"id": "bIUrcXnOqChl" | ||
} | ||
}, | ||
"source": [ | ||
"Now let's investigate how the size of the outputs vary as we change the initialization variance:\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "A55z3rKBqO7M" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Number of layers\n", | ||
"K = 5\n", | ||
|
@@ -178,15 +177,15 @@ | |
"\n", | ||
"for layer in range(1,K+1):\n", | ||
" print(\"Layer %d, std of hidden units = %3.3f\"%(layer, np.std(all_h[layer])))" | ||
], | ||
"metadata": { | ||
"id": "A55z3rKBqO7M" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "VL_SO4tar3DC" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# You can see that the values of the hidden units are increasing on average (the variance is across all hidden units at the layer\n", | ||
"# and the 1000 training examples\n", | ||
|
@@ -196,48 +195,48 @@ | |
"\n", | ||
"# TO DO\n", | ||
"# Now experiment with sigma_sq_omega to try to stop the variance of the forward computation exploding" | ||
], | ||
"metadata": { | ||
"id": "VL_SO4tar3DC" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Now let's define a loss function. We'll just use the least squares loss function. We'll also write a function to compute dloss_doutput\n" | ||
], | ||
"metadata": { | ||
"id": "SxVTKp3IcoBF" | ||
} | ||
}, | ||
"source": [ | ||
"Now let's define a loss function. We'll just use the least squares loss function. We'll also write a function to compute dloss_doutput\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "6XqWSYWJdhQR" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def least_squares_loss(net_output, y):\n", | ||
" return np.sum((net_output-y) * (net_output-y))\n", | ||
"\n", | ||
"def d_loss_d_output(net_output, y):\n", | ||
" return 2*(net_output -y);" | ||
], | ||
"metadata": { | ||
"id": "6XqWSYWJdhQR" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Here's the code for the backward pass" | ||
], | ||
"metadata": { | ||
"id": "98WmyqFYWA-0" | ||
} | ||
}, | ||
"source": [ | ||
"Here's the code for the backward pass" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "LJng7WpRPLMz" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# We'll need the indicator function\n", | ||
"def indicator_function(x):\n", | ||
|
@@ -276,24 +275,24 @@ | |
" all_dl_df[layer-1] = indicator_function(all_f[layer-1]) * all_dl_dh[layer]\n", | ||
"\n", | ||
" return all_dl_dweights, all_dl_dbiases, all_dl_dh, all_dl_df" | ||
], | ||
"metadata": { | ||
"id": "LJng7WpRPLMz" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Now let's look at what happens to the magnitude of the gradients on the way back." | ||
], | ||
"metadata": { | ||
"id": "phFnbthqwhFi" | ||
} | ||
}, | ||
"source": [ | ||
"Now let's look at what happens to the magnitude of the gradients on the way back." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "9A9MHc4sQvbp" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Number of layers\n", | ||
"K = 5\n", | ||
|
@@ -327,15 +326,15 @@ | |
"\n", | ||
"for layer in range(1,K):\n", | ||
" print(\"Layer %d, std of dl_dh = %3.3f\"%(layer, np.std(aggregate_dl_df[layer].ravel())))\n" | ||
], | ||
"metadata": { | ||
"id": "9A9MHc4sQvbp" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "gtokc0VX0839" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# You can see that the gradients of the hidden units are increasing on average (the standard deviation is across all hidden units at the layer\n", | ||
"# and the 100 training examples\n", | ||
|
@@ -345,12 +344,22 @@ | |
"\n", | ||
"# TO DO\n", | ||
"# Now experiment with sigma_sq_omega to try to stop the variance of the gradients exploding\n" | ||
], | ||
"metadata": { | ||
"id": "gtokc0VX0839" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
] | ||
} | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"colab": { | ||
"include_colab_link": true, | ||
"provenance": [] | ||
}, | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |