Lecture 1: Foundations of Reinforcement Learning 1. Introduction to Reinforcement Learning Multi-armed Bandits Contextual Bandits 2. Markov Decision Processes (MDPs) Time-varying MDPs Partially Observable MDPs 3. Existing of the Optimal Stationary Policy