Seminar 1 1. Implementation of Multi-armed Bandits algorithms $\epsilon$-greedy Upper Confidence Bound Thompson Sampling 2. Implementation of the Tiger Problem Stationary Policy History-dependent Policy 3. Proof of the Existence of the Optimal Stationary Policy in MDPs