- Course Overview
- Main TextBooks
- Slides and Papers
- Lecture 1: Introduction to Reinforcement Learning
- Lecture 2: Exploration and Exploitation
- Lecture 3: Finite Markov Decision Processes
- Lecture 4: Dynamic Programming
- Lecture 5: Monte Carlo Methods
- Lecture 6: Temporal-Diference Learning
- Lecture 7: n-step Bootstrapping
- Lecture 8: Planning and Learning with Tabular Methods
- Lecture 9: On-policy Prediction with Approximation
- Lecture 10: On-policy Control with Approximation
- Lecture 11: Off-policy Methods with Approximation
- Lecture 12: Eligibility Traces
- Lecture 13: Policy Gradient Methods
- Lecture 14: Deep Reinforcement Learning
- Lecture 15: Applications
- Lecture 16: Useful Toolkits and Libraries
- Additional Resources
- Class Time and Location
- Projects
- Grading
- Prerequisites
- Topics
- Account
- Academic Honor Code
- Questions
- Miscellaneous:
In this course, you will learn the foundations of Reinforcement Learning. To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare.
Main TextBook:
- Reinforcement Learning by By Richard S. Sutton and Andrew G. Barto
Recommended Slides & Papers:
Required Reading:
- Slide: An Introduction to Reinforcement Learning by Hossein Hajiabolhassan
- Slide: Introduction by Hado van Hasselt
Suggested Reading:
- Blog: An Introduction to Reinforcement Learning by Thomas Simonini
- Blog: Reinforcement Learning Introduction: Foundations and Applications by Nikolay Manchev
Additional Resources:
- Blog: Reinforcement Learning Tutorial
- Blog: Reinforcement Learning: What is, Algorithms, Types & Examples by Daniel Johnson
- Blog: The Unsupervised Reinforcement Learning Benchmark by Misha Laskin and Denis Yarats
Required Reading:
- Slide: An Introduction to Reinforcement Learning by Hossein Hajiabolhassan
- Slide: Exploration and Exploitation by Hado van Hasselt
- Paper: A Tutorial on Thompson Sampling by Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen
- Lecture: Introduction to Thompson Sampling by Erik Waingarten (Instructor: Shipra Agrawal)
Suggested Reading:
- Blog: Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
- Slide: Exploration and Exploitation by David Silver
- Lecture: Stochastic Multi-Armed Bandits, Regret Minimization by Walter Cai, Emisa Nategh, Jennifer Rogers (Lecturer: Kevin Jamieson)
- Blog: Beta Distribution — Intuition, Examples, and Derivation by Aerin Kim
- Blog: Visualizing Beta Distribution and Bayesian Updating by Shaw Lu
- Blog: Conjugate Prior Explained: With Examples & Proofs by Aerin Kim
Additional Resources:
- Tool: The Calculator for Beta Distribution by Dr. Bognar
- Tool: Probability Distribution Explorer: This is a tool for you to explore commonly used probability distributions, including information about the stories behind them (e.g., the outcome of a coin flip is Bernoulli distributed), their probability mass/probability density functions, their moments, etc.
- Blog: Learn Thompson Sampling by Building an Ad Auction! by Will Kurt
- Blog: Do You Know Credible Interval by Shaw Lu
- Toolkit: Multi-armed Bandit Demo by Mark Reid
- Code (Python): Reinforcement Learning: The K-armed bandit problem by Nikolay Manchev
- Code (Python): Multi-Armed Bandit Python Example using UCB by HackDeploy
- Code (Python): Multi-Armed Bandits: Epsilon-Greedy Algorithm with Python Code by Artemis Nika
Required Reading:
- Slide: Dynamic Programming by Hossein Hajiabolhassan
- Slide: MDPs & Dynamic Programming by Diana Borsa
Suggested Reading:
- Blog: Understanding Markov Chains with the Black Friday Puzzle by Will Kurt
- Blog: The Intuition Behind Markov Chains by Kyle Chan
Additional Resources:
- Slide: An Introduction to Markov Decision Processes by Bob Givan and Ron Parr
Required Reading:
- Slide: Dynamic Programming by Hossein Hajiabolhassan
- Slide: MDPs & Dynamic Programming by Diana Borsa
- Blog: GridWorld: Dynamic Programming Demo by Andrej Karpathy
- Blog: Why Does the Optimal Policy Exist? by Alireza Modirshanechi
- Blog: Optimizing Jack's Car Rental by Alexander Kozlov
- Note: How to Gamble If You Must by Kyle Siegrist
- Blog: Hyperbolic Discounting — The Irrational Behavior That Might be Rational After All by Chris Said
Suggested Reading:
To get more familiar with dynamic programing, I recommend to read the following blogs:
- Blog: Overlapping Subproblems Property in Dynamic Programming
- Blog: Optimal Substructure Property in Dynamic Programming
- Blog: Longest Increasing Subsequence
- Blog: Longest Common Subsequence
Additional Resources:
- Algorithms: Visualizations of Graph Algorithms: Some important algorithms of this area are presented and explained in the following, including both an interactive applet and pseudocode.
- Blog: Bellman–Ford Algorithm
Required Reading:
- Slide: Model-Free Prediction by Hado van Hasselt
- Blog: Introduction to Monte Carlo Methods by Asael Alonzo Matamoros
- Blog: Introduction to Monte Carlo simulation by Kinder Chen
- Blog: Off Policy Monte Carlo Prediction with Importance sampling by Shangeth Rajaa
Suggested Reading:
- Paper: Monte Carlo Methods by Jonathan Pengelly
- Blog: What is Rejection Sampling? by Kapil Sachdeva
Required Reading:
- Blog: Reinforcement Learning Tutorial Part 1: Q-Learning by Juha Kiili
Suggested Reading:
- Blog: Deep Double Q-Learning — Why You Should Use It by Ameet Deshpande
- Blog: 5 Steps to Master the Reinforcement Learning with a Q-Learning Python Example by Rune
- Blog: Reinforcement Learning — Generalisation of Continuing Tasks by Jeremy Zhang
Additional Resources:
- Blog: Dopamine and Temporal Difference Learning: A Fruitful Relationship Between Neuroscience and AI by Will Dabney and Zeb Kurth-Nelson
- Blog: Temporal-Difference (TD) Learning (Using Gym) by Christian Herta
Required Reading:
- Multi-step Bootstrapping by Doina Precup
Required Reading:
- Blog: Integrating Real and Simulated Data in Dyna-Q Algorithm by Ranko Mosic
Suggested Reading:
- Monte Carlo Tree Search – Beginners Guide by Kamil Czarnogórski
- Blog: Monte Carlo Tree Search: An Introduction by Benjamin Wang
- Blog: Introduction to Monte Carlo Tree Search: The Game-Changing Algorithm behind DeepMind's AlphaGo by Ankit Choudhary
Required Reading:
- Slide: Function Approximation in Reinforcement Learning by Hado van Hasselt
- Blog: Tile-Coding: An Efficient Sparse-Coding Method for Real-Valued Data by Hamid Maei
- Blog: State Aggregation with Monte Carlo
Suggested Reading:
- Blog: Radial Basis Function Neural Network Simplified by Luthfi Ramadhan
- Blog: RBF Neural Networks
Required Reading:
- Slide: Function Approximation in Reinforcement Learning by Hado van Hasselt
Suggested Reading:
- Blog: Tile Coding Software by Richard S. Sutton
Required Reading:
- Slide: Multi-step & Off Policy by Hado van Hasselt
Required Reading:
- Eligibility Traces by Doina Precup
Required Reading:
- Slide: Policy Gradient by David Silver
- Slide: Policy-Gradient & Actor-Critic methods by Hado van Hasselt
Required Reading:
- Slide: Deep Reinforcement Learning 1 by Matteo Hessel
- Slide: Deep Reinforcement Learning 2 by Matteo Hessel
Suggested Reading:
- Blog: Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy
- Blog: Reinforcement Learning with Neural Network by Kumar Chandrakant
Additional Resources:
- Toolkit: Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).
- Blog: A Free course in Deep Reinforcement Learning from Beginner to Expert Thomas Simonini
Required Reading:
- Slide: Classic Games by David Silver
Additional Resources:
- Blog: Applications by David Silver
- Blog: Emergent Tool Use from Multi-Agent Interaction by OpenAI
- Blog: Solving Rubik’s Cube with a Robot Hand by OpenAI
Required Reading:
- Toolkit: Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.
- Blog: Algorithms
- Blog: Classic Control
- Blog: Robotics
- Blog: MuJoCo
- Blog: Atari
- Blog: Wrappers
Suggested Reading:
- Blog: Tutorial: writing a custom OpenAI Gym environment by Vadim Liventsev
- Python Module: Deque in Python
Additional Resources:
- Package: Highway-env’s Documentation provides a collection of environment for decision-making in Autonomous Driving.
-
Papers:
- Slide: Distributed RL by Richard Liaw
- PDF: Acme: A Research Framework for Distributed Reinforcement Learning
- Blog: Acme: A New Framework for Distributed Reinforcement Learning
- GitHub: Must-read Papers on GNN by Natural Language Processing Lab at Tsinghua University
-
Online Demos:
- Blog: ConvNetJS Deep Q Learning Demo by Andrej Karpathy
-
Codes:
- Codes: Reinforcement Learning an Introduction by Shangtong Zhang
-
Courses:
- Blog: Reinforcement Learning Lecture Series 2021 (DeepMind) by Hado van Hasselt, Diana Borsa & Matteo Hessel
Blog: A Course taught by David Silver: - Introduction to Reinforcement Learning
- Reinforcement Learning
- Blog: Reinforcement Learning Lecture Series 2021 (DeepMind) by Hado van Hasselt, Diana Borsa & Matteo Hessel
Saturday and Monday
Tuesday
Projects are programming assignments that cover the topic of this course. Any project is written by Jupyter Notebook. Projects will require the use of Python 3.7, as well as additional Python libraries.
Google Colab is a free cloud service and it supports free GPU!
- How to Use Google Colab by Souvik Mandal
- Primer for Learning Google Colab
- Deep Learning Development with Google Colab, TensorFlow, Keras & PyTorch
- Technical Notes On Using Data Science & Artificial Intelligence: To Fight For Something That Matters by Chris Albon
The students can include mathematical notation within markdown cells using LaTeX in their Jupyter Notebooks.
- A Brief Introduction to LaTeX PDF
- Math in LaTeX PDF
- Sample Document PDF
- TikZ: A collection Latex files of PGF/TikZ figures (including various neural networks) by Petar Veličković.
- Projects and Midterm – 50%
- Endterm – 50%
- First Midterm Examination:
- Second Midterm Examination:
- Final Examination:
General mathematical sophistication; and a solid understanding of Algorithms, Linear Algebra, and Probability Theory, at the advanced undergraduate or beginning graduate level, or equivalent.
- Video: Professor Gilbert Strang's Video Lectures on linear algebra.
- Learn Probability and Statistics Through Interactive Visualizations: Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).
- Statistics and Probability: This website provides training and tools to help you solve statistics problems quickly, easily, and accurately - without having to ask anyone for help.
- Jupyter NoteBooks: Introduction to Statistics by Bargava
- Video: Professor John Tsitsiklis's Video Lectures on Applied Probability.
- Video: Professor Krishna Jagannathan's Video Lectures on Probability Theory.
Have a look at some assignments of Stanford students (Reinforcement Learning to get some general inspiration.
It is necessary to have a GitHub account to share your projects. It offers plans for both private repositories and free accounts. Github is like the hammer in your toolbox, therefore, you need to have it!
Honesty and integrity are vital elements of the academic works. All your submitted assignments must be entirely your own (or your own group's).
We will follow the standard of Department of Mathematical Sciences approach:
- You can get help, but you MUST acknowledge the help on the work you hand in
- Failure to acknowledge your sources is a violation of the Honor Code
- You can talk to others about the algorithm(s) to be used to solve a homework problem; as long as you then mention their name(s) on the work you submit
- You should not use code of others or be looking at code of others when you write your own: You can talk to people but have to write your own solution/code
I will be having office hours for this course on Saturday (09:00 AM--10:00 AM). If this is not convenient, email me at [email protected] or talk to me after class.