CAP 4630 Artificial Intelligence

Undergraduate course on ML/AI at the University of Central Florida.

Overview

Fundamental machine learning concepts

Python, numpy, and matplotlib

Effect of learning rate on gradient descent for finding minima of univariate functions

Let's examine what could go wrong when applying gradient descent with a poorly chosen learning rate. We could fail to find any solution due to divergence or we could get stuck in a bad local minimum. The following notebook allows us to apply gradient descent for finding minima of univariate functions. (Univariate means that the functions depend on only one variable.)

Notebook for experimenting with different learning rates

Visualization of bivariate functions

The loss function for a deep neural network depends on millions of parameters. Such functions are called multivariate because they depend on multiple variables. It is no longer possible to easily visualize multivariate functions.

The following notebooks present two methods for visualizing bivariate function, that is, those that depend on exactly two variables. Such functions define surfaces in 3D. Think of the surface of a mountain range.

Linear regression using gradient descent - numpy implementation

To get started, let's consider the simple case of linear regression: n=1, that is, there is only one feature and the model has only one weight (and a bias term).

Mathematical derivation of gradient for linear regression

In the first implementation, we consider the weight and bias separately and implement stochastic gradient descent. It is easy to see the correspondance between the code and the mathematical expression for the gradient.

Notebook for solving linear regression using stochastic gradient descent

In the second implementation, we combine the weight and bias into one vector. We also consider three versions of gradient descent: batch, stochastic, and mini-batch gradient descent. We use a vectorized implementation, that is, all data in a batch is processed in parallel. It is more difficult to see the correspondance between the code and the mathematical expression for the gradient.

Notebook for solving linear regression using gradient descent (batch, stochastic, mini-batch)

(TO DO: improve everything below!)

Linear regression - Keras implementation

Let's see how we can solve the simplest case of linear regression in Keras.

Notebook for solving linear regression

Linear regression using the normal equation - numpy implementation

There is a closed-form solution for choosing the best weights and bias for linear regression. The optimal solution achieves the smallest squared error loss.

Colab notebook for solving linear regression using normal equation

To understand the mathematics underlying the normal equation, read the following materials. I will not cover the derivation of the normal equation.

TensorFlow and Keras

We will use Keras to build (almost) all deep learning models. Roughtly speaking, TensorFlow is a back-end for deep learning, whereas Keras is a front-front. Keras can use TensorFlow or other backends.

Keras is now part of the latest version of TensorFlow 2.0 so it is available automatically when you import tensorflow. Previously (TensorFlow 1.x) you had to import Keras seperately. I may need to do some minor tweaks to the notebooks so that everything is perfectly adapted to TensorFlow 2.0.

Keras examples

Let's now see how we can solve more interesting problems with Keras. We consider the problems of classifying images from the MNIST digits, MNIST fashion items, and CIFAR10 datasets.

The classifications problems are all multi-class, single-label classifications problems. Multi-class means that there are several classes. For instance, T-shirt, pullover or bag in the MNIST fashion items dataset. Single-label means that classes are mutually exclusive. For instance, an image is either the digit 0, or the digit 1, etc. in the MNIST digits dataset.

In the multi-class, single-label classification problem, the activation in the last layer is softmax and the loss function is categorical cross entropy.

The examples below use the so-called relu activation function for the hidden layer.

Generalization, overfitting, splitting data in train & test sets

The goal of machine learning is to obtain models that perform well on new unseen data, that is. For instance, it can happen that a model performs perfectly on the training data, but fails on new data. This is called overfitting. The following notes explain briefly how to deal with this important issue.

5 Slides

Validation

6 Slides

Logistic regression, gradient for squared error loss, and gradient for binary cross entropy loss

Logistic regression is used for binary classification problems. Binary means that there are only two classes. For instance, an image has to be classified as either a cat or a dog. There is only one output neuron whose activation indicates the class (say, 1=dog, 0=cat). It is best to use the binary cross entropy loss instead of the squared error loss.

Sigmoid activation functions are used in multi-class, multi-label classification problems. The number of output neurons is equal to the number of classes, and each neuron uses the sigmoid activation function. The binary cross entropy loss is used for each output neuron.

Logistic regression notes

Softmax, gradient for categorical cross entropy loss

Sequential neural networks with dense layers

These notes explain how to compute the gradients for neural networks consisting of multiple dense layers. I will not go over the mathematical derivation of the backpropagation algorithm. Fortunately, the gradients are computed automatically in Keras.

Notes on forward propagation, backpropagation algorithm for computing partial derivatives wrt weights and biases
Code for creating sequential neural networks with dense layers and training them with backprop and mini-batch SGD; currently, code is limited to (1) mean squared error loss and (2) sigmoid activations.

Deep learning for computer vision (convolutional neural networks)

CNN slides

TO DO: add note of preventing overfitting with data augmentation (also, add L2/L1 regularization and dropout earlier!)

Classification of MNIST digits and fashion items

Classification of cats and dogs

based on Chapter 5 Deep learning for computer vision of the book Deep learning with Python by F. Chollet

based on Google ML Practicum: Image Classification

Visualizing what convnets learn

based on chapter 5 Deep learning for computer vision of the book Deep learning with Python by F. Chollet

Visualizing intermediate activations
Visualizing convnet filters, the convnet filter visualizations at the bottom of the notebook look pretty cool!
Visualizing heatmaps of class activations
Visualizing heatmaps of class activations, modified version, changes softmax to linear activation in last layer
keras-vis This is a package for producing cool looking visualizations. I had problems using it on colab.

Some cool looking stuff

Based on Section 8.2 DeepDream and Section 8.3 Neural style transfer of the book Deep learning with Python by F. Chollet. I am not going to explain in detail how deep dream and neural style transfer work. I just wanted to include these notebooks to show you two cool examples of what can be done with deep neural networks.

Deep learning for computer vision (residual networks)

The goal is to introduce more advanced architectures and concepts. This is based onthe Keras documentation: CIFAR-10 ResNet.

The relevant research papers are:

Notebooks

Resnet for CIFAR10 - train/val/test

I have made several changes to the code from the Keras documentation. In the above notebook, I had to change the number of epochs and the learning rate schedule because the model is only trained on 40k and validated on 10k, whereas the model in the Keras documentation is trained on 50k and not validated at all. I wanted to have a situation that is similar to the situation in HW 2 so we can better compare the performance of the ResNet and the (normal) CNN.

Resnet for CIFAR10- train/test

Visualizing high-dimensional data using t-SNE

Text

Character-based

Word-based

Word embeddings
Using 1D convnets (TO D)
Word embeddings (TO DO: change notebook !!!)
Newsgroup classification with convolutional model using pretrained Glove embeddings (TO DO)
IMDB sentiment classification with LSTM model (TO DO)
...

One-shot learning

Variational Autoencoder

VAE for MNIST digits

Sequence-to-sequence models

Arguments return_sequences and and return_sequencesfor LSTM cells in Keras
Character-based sequence-to-sequence model for translating French to English
TO DO: sequence-to-sequence model with attention

Tools, additional materials

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
code		code
colab		colab
homework		homework
slides		slides
README.md		README.md
test.html		test.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAP 4630 Artificial Intelligence

Overview

Fundamental machine learning concepts

Python, numpy, and matplotlib

Effect of learning rate on gradient descent for finding minima of univariate functions

Visualization of bivariate functions

Linear regression using gradient descent - numpy implementation

Linear regression - Keras implementation

Linear regression using the normal equation - numpy implementation

TensorFlow and Keras

Keras examples

Generalization, overfitting, splitting data in train & test sets

Validation

Logistic regression, gradient for squared error loss, and gradient for binary cross entropy loss

Softmax, gradient for categorical cross entropy loss

Sequential neural networks with dense layers

Deep learning for computer vision (convolutional neural networks)

Deep learning for computer vision (residual networks)

Visualizing high-dimensional data using t-SNE

Text

Character-based

Word-based

One-shot learning

Variational Autoencoder

Sequence-to-sequence models

About

Releases

Packages

Languages

PtatG/machine_learning_course

Folders and files

Latest commit

History

Repository files navigation

CAP 4630 Artificial Intelligence

Overview

Fundamental machine learning concepts

Python, numpy, and matplotlib

Effect of learning rate on gradient descent for finding minima of univariate functions

Visualization of bivariate functions

Linear regression using gradient descent - numpy implementation

Linear regression - Keras implementation

Linear regression using the normal equation - numpy implementation

TensorFlow and Keras

Keras examples

Generalization, overfitting, splitting data in train & test sets

Validation

Logistic regression, gradient for squared error loss, and gradient for binary cross entropy loss

Softmax, gradient for categorical cross entropy loss

Sequential neural networks with dense layers

Deep learning for computer vision (convolutional neural networks)

Deep learning for computer vision (residual networks)

Visualizing high-dimensional data using t-SNE

Text

Character-based

Word-based

One-shot learning

Variational Autoencoder

Sequence-to-sequence models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages