Incremental Learning Q-LEARN Q-Learning Markov chain problem, (state, action, new state, reward) Lots of Exploration in the beginning, then exploitation Returns optimal policy. Refer to youtube here RL IN DL A review paper about RL in DL