Skip to content

Commit

Permalink
check
Browse files Browse the repository at this point in the history
  • Loading branch information
zaretskinikita committed Sep 13, 2024
1 parent b7869e8 commit 05c7cb0
Showing 1 changed file with 106 additions and 97 deletions.
203 changes: 106 additions & 97 deletions Notebooks/Chap07/7_3_Initialization.ipynb
Original file line number Diff line number Diff line change
@@ -1,32 +1,20 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap07/7_3_Initialization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L6chybAVFJW2"
},
"source": [
"# **Notebook 7.3: Initialization**\n",
"\n",
Expand All @@ -35,10 +23,16 @@
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at [email protected] if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "L6chybAVFJW2"
}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# check"
]
},
{
"cell_type": "code",
Expand All @@ -54,15 +48,20 @@
},
{
"cell_type": "markdown",
"source": [
"First let's define a neural network. We'll just choose the weights and biases randomly for now"
],
"metadata": {
"id": "nnUoI0m6GyjC"
}
},
"source": [
"First let's define a neural network. We'll just choose the weights and biases randomly for now"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WVM4Tc_jGI0Q"
},
"outputs": [],
"source": [
"def init_params(K, D, sigma_sq_omega):\n",
" # Set seed so we always get the same random numbers\n",
Expand All @@ -89,29 +88,29 @@
" all_biases[layer] = np.zeros((D,1))\n",
"\n",
" return all_weights, all_biases"
],
"metadata": {
"id": "WVM4Tc_jGI0Q"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jZh-7bPXIDq4"
},
"outputs": [],
"source": [
"# Define the Rectified Linear Unit (ReLU) function\n",
"def ReLU(preactivation):\n",
" activation = preactivation.clip(0.0)\n",
" return activation"
],
"metadata": {
"id": "jZh-7bPXIDq4"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LgquJUJvJPaN"
},
"outputs": [],
"source": [
"def compute_network_output(net_input, all_weights, all_biases):\n",
"\n",
Expand Down Expand Up @@ -140,24 +139,24 @@
" net_output = all_f[K]\n",
"\n",
" return net_output, all_f, all_h"
],
"metadata": {
"id": "LgquJUJvJPaN"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "markdown",
"source": [
"Now let's investigate how the size of the outputs vary as we change the initialization variance:\n"
],
"metadata": {
"id": "bIUrcXnOqChl"
}
},
"source": [
"Now let's investigate how the size of the outputs vary as we change the initialization variance:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "A55z3rKBqO7M"
},
"outputs": [],
"source": [
"# Number of layers\n",
"K = 5\n",
Expand All @@ -178,15 +177,15 @@
"\n",
"for layer in range(1,K+1):\n",
" print(\"Layer %d, std of hidden units = %3.3f\"%(layer, np.std(all_h[layer])))"
],
"metadata": {
"id": "A55z3rKBqO7M"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "VL_SO4tar3DC"
},
"outputs": [],
"source": [
"# You can see that the values of the hidden units are increasing on average (the variance is across all hidden units at the layer\n",
"# and the 1000 training examples\n",
Expand All @@ -196,48 +195,48 @@
"\n",
"# TO DO\n",
"# Now experiment with sigma_sq_omega to try to stop the variance of the forward computation exploding"
],
"metadata": {
"id": "VL_SO4tar3DC"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "markdown",
"source": [
"Now let's define a loss function. We'll just use the least squares loss function. We'll also write a function to compute dloss_doutput\n"
],
"metadata": {
"id": "SxVTKp3IcoBF"
}
},
"source": [
"Now let's define a loss function. We'll just use the least squares loss function. We'll also write a function to compute dloss_doutput\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6XqWSYWJdhQR"
},
"outputs": [],
"source": [
"def least_squares_loss(net_output, y):\n",
" return np.sum((net_output-y) * (net_output-y))\n",
"\n",
"def d_loss_d_output(net_output, y):\n",
" return 2*(net_output -y);"
],
"metadata": {
"id": "6XqWSYWJdhQR"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "markdown",
"source": [
"Here's the code for the backward pass"
],
"metadata": {
"id": "98WmyqFYWA-0"
}
},
"source": [
"Here's the code for the backward pass"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LJng7WpRPLMz"
},
"outputs": [],
"source": [
"# We'll need the indicator function\n",
"def indicator_function(x):\n",
Expand Down Expand Up @@ -276,24 +275,24 @@
" all_dl_df[layer-1] = indicator_function(all_f[layer-1]) * all_dl_dh[layer]\n",
"\n",
" return all_dl_dweights, all_dl_dbiases, all_dl_dh, all_dl_df"
],
"metadata": {
"id": "LJng7WpRPLMz"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "markdown",
"source": [
"Now let's look at what happens to the magnitude of the gradients on the way back."
],
"metadata": {
"id": "phFnbthqwhFi"
}
},
"source": [
"Now let's look at what happens to the magnitude of the gradients on the way back."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9A9MHc4sQvbp"
},
"outputs": [],
"source": [
"# Number of layers\n",
"K = 5\n",
Expand Down Expand Up @@ -327,15 +326,15 @@
"\n",
"for layer in range(1,K):\n",
" print(\"Layer %d, std of dl_dh = %3.3f\"%(layer, np.std(aggregate_dl_df[layer].ravel())))\n"
],
"metadata": {
"id": "9A9MHc4sQvbp"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gtokc0VX0839"
},
"outputs": [],
"source": [
"# You can see that the gradients of the hidden units are increasing on average (the standard deviation is across all hidden units at the layer\n",
"# and the 100 training examples\n",
Expand All @@ -345,12 +344,22 @@
"\n",
"# TO DO\n",
"# Now experiment with sigma_sq_omega to try to stop the variance of the gradients exploding\n"
],
"metadata": {
"id": "gtokc0VX0839"
},
"execution_count": null,
"outputs": []
]
}
]
}
],
"metadata": {
"colab": {
"include_colab_link": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit 05c7cb0

Please sign in to comment.