Skip to content

Commit

Permalink
Created using Colaboratory
Browse files Browse the repository at this point in the history
  • Loading branch information
udlbook committed Jul 28, 2023
1 parent f65e5c7 commit 7e99548
Showing 1 changed file with 345 additions and 0 deletions.
345 changes: 345 additions & 0 deletions Notebooks/Chap07/7_2_Backpropagation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyN2nPVR0imZntgj4Oasyvmo",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap07/7_2_Backpropagation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 7.2: Backpropagation**\n",
"\n",
"This notebook runs the backpropagation algorithm on a deep neural network as described in section 7.4 of the book.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at [email protected] if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "L6chybAVFJW2"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LdIDglk1FFcG"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"source": [
"First let's define a neural network. We'll just choose the weights and biases randomly for now"
],
"metadata": {
"id": "nnUoI0m6GyjC"
}
},
{
"cell_type": "code",
"source": [
"# Set seed so we always get the same random numbers\n",
"np.random.seed(0)\n",
"\n",
"# Number of layers\n",
"K = 5\n",
"# Number of neurons per layer\n",
"D = 6\n",
"# Input layer\n",
"D_i = 1\n",
"# Output layer\n",
"D_o = 1\n",
"\n",
"# Make empty lists\n",
"all_weights = [None] * (K+1)\n",
"all_biases = [None] * (K+1)\n",
"\n",
"# Create input and output layers\n",
"all_weights[0] = np.random.normal(size=(D, D_i))\n",
"all_weights[-1] = np.random.normal(size=(D_o, D))\n",
"all_biases[0] = np.random.normal(size =(D,1))\n",
"all_biases[-1]= np.random.normal(size =(D_o,1))\n",
"\n",
"# Create intermediate layers\n",
"for layer in range(1,K):\n",
" all_weights[layer] = np.random.normal(size=(D,D))\n",
" all_biases[layer] = np.random.normal(size=(D,1))"
],
"metadata": {
"id": "WVM4Tc_jGI0Q"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define the Rectified Linear Unit (ReLU) function\n",
"def ReLU(preactivation):\n",
" activation = preactivation.clip(0.0)\n",
" return activation"
],
"metadata": {
"id": "jZh-7bPXIDq4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's run our random network. The weight matrices $\\boldsymbol\\Omega_{1\\ldots K}$ are the entries of the list \"all_weights\" and the biases $\\boldsymbol\\beta_{1\\ldots k}$ are the entries of the list \"all_biases\"\n",
"\n",
"We know that we will need the activations $\\mathbf{f}_{0\\ldots K}$ and the activations $\\mathbf{h}_{1\\ldots K}$ for the forward pass of backpropagation, so we'll store and return these as well.\n"
],
"metadata": {
"id": "5irtyxnLJSGX"
}
},
{
"cell_type": "code",
"source": [
"def compute_network_output(net_input, all_weights, all_biases):\n",
"\n",
" # Retrieve number of layers\n",
" K = len(all_weights) -1\n",
"\n",
" # We'll store the pre-activations at each layer in a list \"all_f\"\n",
" # and the activations in a second list[all_h].\n",
" all_f = [None] * (K+1)\n",
" all_h = [None] * (K+1)\n",
"\n",
" #For convenience, we'll set\n",
" # all_h[0] to be the input, and all_f[K] will be the output\n",
" all_h[0] = net_input\n",
"\n",
" # Run through the layers, calculating all_f[0...K-1] and all_h[1...K]\n",
" for layer in range(K):\n",
" # Update preactivations and activations at this layer according to eqn 7.16\n",
" # Remmember to use np.matmul for matrrix multiplications\n",
" # TODO -- Replace the lines below\n",
" all_f[layer] = all_h[layer]\n",
" all_h[layer+1] = all_f[layer]\n",
"\n",
" # Compute the output from the last hidden layer\n",
" # TO DO -- Replace the line below\n",
" all_f[K] = np.zeros_like(all_biases[-1])\n",
"\n",
" # Retrieve the output\n",
" net_output = all_f[K]\n",
"\n",
" return net_output, all_f, all_h"
],
"metadata": {
"id": "LgquJUJvJPaN"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define in input\n",
"net_input = np.ones((D_i,1)) * 1.2\n",
"# Compute network output\n",
"net_output, all_f, all_h = compute_network_output(net_input,all_weights, all_biases)\n",
"print(\"True output = %3.3f, Your answer = %3.3f\"%(1.907, net_output[0,0]))"
],
"metadata": {
"id": "IN6w5m2ZOhnB"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's define a loss function. We'll just use the least squares loss function. We'll also write a function to compute dloss_doutput"
],
"metadata": {
"id": "SxVTKp3IcoBF"
}
},
{
"cell_type": "code",
"source": [
"def least_squares_loss(net_output, y):\n",
" return np.sum((net_output-y) * (net_output-y))\n",
"\n",
"def d_loss_d_output(net_output, y):\n",
" return 2*(net_output -y);"
],
"metadata": {
"id": "6XqWSYWJdhQR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"y = np.ones((D_o,1)) * 20.0\n",
"loss = least_squares_loss(net_output, y)\n",
"print(\"y = %3.3f Loss = %3.3f\"%(y, loss))"
],
"metadata": {
"id": "njF2DUQmfttR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's compute the derivatives of the network. We already computed the forward pass. Let's compute the backward pass."
],
"metadata": {
"id": "98WmyqFYWA-0"
}
},
{
"cell_type": "code",
"source": [
"# We'll need the indicator function\n",
"def indicator_function(x):\n",
" x_in = np.array(x)\n",
" x_in[x_in>=0] = 1\n",
" x_in[x_in<0] = 0\n",
" return x_in\n",
"\n",
"# Main backward pass routine\n",
"def backward_pass(all_weights, all_biases, all_f, all_h, y):\n",
" # We'll store the derivatives dl_dweights and dl_dbiases in lists as well\n",
" all_dl_dweights = [None] * (K+1)\n",
" all_dl_dbiases = [None] * (K+1)\n",
" # And we'll store the derivatives of the loss with respect to the activation and preactivations in lists\n",
" all_dl_df = [None] * (K+1)\n",
" all_dl_dh = [None] * (K+1)\n",
" # Again for convenience we'll stick with the convention that all_h[0] is the net input and all_f[k] in the net output\n",
"\n",
" # Compute derivatives of net output with respect to loss\n",
" all_dl_df[K] = np.array(d_loss_d_output(all_f[K],y))\n",
"\n",
" # Now work backwards through the network\n",
" for layer in range(K,-1,-1):\n",
" # TODO Calculate the derivatives of biases at layer this from all_dl_df[layer]. (eq 7.21)\n",
" # NOTE! To take a copy of matrix X, use Z=np.array(X)\n",
" # REPLACE THIS LINE\n",
" all_dl_dbiases[layer] = np.zeros_like(all_biases[layer])\n",
"\n",
" # TODO Calculate the derivatives of weight at layer from all_dl_df[K] and all_h[K] (eq 7.22)\n",
" # Don't forget to use np.matmul\n",
" # REPLACE THIS LINE\n",
" all_dl_dweights[layer] = np.zeros_like(all_weights[layer])\n",
"\n",
" # TODO: calculate the derivatives of activations from weight and derivatives of next preactivations (eq 7.20)\n",
" # REPLACE THIS LINE\n",
" all_dl_dh[layer] = np.zeros_like(all_h[layer])\n",
"\n",
"\n",
" if layer > 0:\n",
" # TODO Calculate the derivatives of the pre-activation f with respect to activation h (deriv of ReLu function)\n",
" # REPLACE THIS LINE\n",
" all_dl_df[layer-1] = np.zeros_like(all_f[layer-1])\n",
"\n",
" return all_dl_dweights, all_dl_dbiases"
],
"metadata": {
"id": "LJng7WpRPLMz"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"all_dl_dweights, all_dl_dbiases = backward_pass(all_weights, all_biases, all_f, all_h, y)"
],
"metadata": {
"id": "9A9MHc4sQvbp"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"np.set_printoptions(precision=3)\n",
"# Make space for derivatives computed by finite differences\n",
"all_dl_dweights_fd = [None] * (K+1)\n",
"all_dl_dbiases_fd = [None] * (K+1)\n",
"\n",
"# Let's test if we have the derivatives right using finite differences\n",
"delta_fd = 0.000001\n",
"\n",
"# Test the dervatives of the bias vectors\n",
"for layer in range(K):\n",
" dl_dbias = np.zeros_like(all_dl_dbiases[layer])\n",
" # For every element in the bias\n",
" for row in range(all_biases[layer].shape[0]):\n",
" # Take copy of biases We'll change one element each time\n",
" all_biases_copy = [np.array(x) for x in all_biases]\n",
" all_biases_copy[layer][row] += delta_fd\n",
" network_output_1, *_ = compute_network_output(net_input, all_weights, all_biases_copy)\n",
" network_output_2, *_ = compute_network_output(net_input, all_weights, all_biases)\n",
" dl_dbias[row] = (least_squares_loss(network_output_1, y) - least_squares_loss(network_output_2,y))/delta_fd\n",
" all_dl_dbiases_fd[layer] = np.array(dl_dbias)\n",
" print(\"Bias %d, derivatives from backprop:\"%(layer))\n",
" print(all_dl_dbiases[layer])\n",
" print(\"Bias %d, derivatives from finite differences\"%(layer))\n",
" print(all_dl_dbiases_fd[layer])\n",
"\n",
"\n",
"# Test the derivatives of the weights matrices\n",
"for layer in range(K):\n",
" dl_dweight = np.zeros_like(all_dl_dweights[layer])\n",
" # For every element in the bias\n",
" for row in range(all_weights[layer].shape[0]):\n",
" for col in range(all_weights[layer].shape[1]):\n",
" # Take copy of biases We'll change one element each time\n",
" all_weights_copy = [np.array(x) for x in all_weights]\n",
" all_weights_copy[layer][row][col] += delta_fd\n",
" network_output_1, *_ = compute_network_output(net_input, all_weights_copy, all_biases)\n",
" network_output_2, *_ = compute_network_output(net_input, all_weights, all_biases)\n",
" dl_dweight[row][col] = (least_squares_loss(network_output_1, y) - least_squares_loss(network_output_2,y))/delta_fd\n",
" all_dl_dweights_fd[layer] = np.array(dl_dweight)\n",
" print(\"Weight %d, derivatives from backprop:\"%(layer))\n",
" print(all_dl_dweights[layer])\n",
" print(\"Weight %d, derivatives from finite differences\"%(layer))\n",
" print(all_dl_dweights_fd[layer])"
],
"metadata": {
"id": "PK-UtE3hreAK"
},
"execution_count": null,
"outputs": []
}
]
}

0 comments on commit 7e99548

Please sign in to comment.