Reinforcement Learning with Python will take your learning to the next level. It will help you master the concepts of reinforcement learning to deep reinforcement learning and you will see things in action. The book will explain everything from scratch by implementing practical applications.
The book starts with an introduction to Reinforcement Learning, OpenAI, and TensorFlow. You will then explore Reinforcement learning algorithms and concepts such as the Markov Decision Processes (MDPs), Monte-Carlo tree search, and dynamic programming, including policy and value iteration. You will get to grips with temporal difference learning algorithms, including Q-learning and SARSA. This example-rich guide will introduce you to neural networks and deep learning, covering various deep learning algorithms. You will explore deep reinforcement learning in depth, which is a combination of deep learning and reinforcement learning. You will also learn how deep reinforcement learning algorithms can be used with TensorFlow to build intelligent applications.
- 1.1. What is Reinforcement Learning?
- 1.2. Reinforcement Learning Cycle
- 1.3. How RL differs from other ML Paradigms?
- 1.4. Elements of Reinforcement Learning
- 1.5. Agent Environment Interface
- 1.6. Types of RL Environments
- 1.7. Reinforcement Learning Platforms
- 1.8. Applications of Reinforcement Learning
- 2.1. Setting Up Your Machine
- 2.2. Installing Anaconda
- 2.3. Installing Docker
- 2.4. Installing OpenAI Gym and Universe
- 2.5. Common Error Fixes
- 2.6. OpenAI Gym
- 2.7. Basic Simulations
- 2.8. Training a Robot to walk
- 2.9. Building a Video Game Bot
- 2.10. Tensorflow Fundamentals
- 2.11. Tensorboard
- 3.1. Markov Chain and Markov Process
- 3.2. Markov Decision Process
- 3.3. Rewards and Returns
- 3.4. Episodic and Continous Tasks
- 3.5. Policy Function
- 3.6. State Value Function
- 3.7. State-Action Value Function (Q Function)
- 3.8. Bellman Equation and Optimality
- 3.9. Deriving Bellman Equation for Value and Q functions
- 3.10. Solving the Bellman Equation
- 3.11. Dynamic Programming
- 3.12. Solving Frozen Lake Problem using Value Iteration
- 3.13. Solving Frozen Lake Problem using Policy Iteration
- 4.1. Monte Carlo Methods
- 4.2. Estimating Value of Pi Using Monte Carlo
- 4.3. Monte Carlo Prediction
- 4.4. First visit Monte Carlo
- 4.5. Every visit Monte Carlo
- 4.6. BlackJack with Monte Carlo
- 4.7. Monte Carlo Control
- 4.8. Monte Carlo Exploration Starts
- 4.9. On Policy Monte Carlo Control
- 4.10. Off Policy Monte Carlo Control
- 5.1. Temporal Difference Learning
- 5.2. TD Prediction
- 5.3. TD Control
- 5.4. Q Learning
- 5.5. Solving the Taxi Problem using Q learning
- 5.6. SARSA
- 5.7. Solving the Taxi Problem using SARSA
- 5.8. Difference Between Q learning and SARSA
- 6.1. Multi-armed Bandit Problem
- 6.2. Epsilon-Greedy Algorithm
- 6.3. Softmax Exploration Algorithm
- 6.4. Upper Confidence Bound Algorithm
- 6.5. Thompson Sampling Algorithm
- 6.6. Applications of MAB
- 6.7. Identifying Right Advertisement Banner Using MAB
- 6.8. Contextual Bandits
- 7.1. Artificial Neurons
- 7.2. Artificial Neural Network
- 7.3. Activation Functions
- 7.4. Deep Dive into ANN
- 7.5. Gradient Descent
- 7.6. Neural Networks in Tensorflow
- 7.7. Recurrent Neural Network
- 7.8. Backpropagation Through Time
- 7.9. Long Short Term Memory RNN
- 7.10. Generating Song Lyrics using LSTM RNN
- 7.11. Convolutional Neural Networks
- 7.12. CNN Architecture
- 7.13. Classifying Clothes Using CNN
- 8.1. What is Deep Q network?
- 8.2. Architecture of DQN
- 8.3. Convolutional Network
- 8.4. Experience Replay
- 8.5. Target Network
- 8.6. Clipping Rewards
- 8.7. DQN Algorithm
- 8.8. Building an Agent to Play Atari Games
- 8.9. Double DQN
- 8.10. Dueling Architecture
- 9.1. Deep Recurrent Q Network
- 9.2. Partially Observable MDP
- 9.3. Architecture of DRQN
- 9.4. Basic Doom Game
- 9.5. Build an Agent to Play Doom Game using DRQN
- 9.6. Deep Attention Recurrent Q Network
- 10.1. Asynchronous Actor Critic Algorithm
- 10.2. The three A's
- 10.3. Architecture of A3C
- 10.4. Working of A3C
- 10.5. Drive up the Mountain with A3C
- 10.6. Visualization in Tensorboard
- 11.1. Policy Gradient
- 11.2. Lunar Lander Using Policy Gradient
- 11.3. Deep Deterministic Policy Gradient
- 11.4. Swinging up the Pendulum using DDPG
- 11.5. Trust Region Policy Optimization
- 11.6. Proximal Policy Optimization