- Slides
- Video lecture by D. Silver - video
- Our lecture, seminar(pytorch), seminar(theano)
- Alternative lecture by J. Schulman part 1 - video
- Alternative lecture by J. Schulman part 2 - video
-
Actually proving the policy gradient for discounted rewards - article
-
On variance of policy gradient and optimal baselines: article, another article
-
Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article
-
Generalizing log-derivative trick - url
-
Combining policy gradient and q-learning - arxiv
-
Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf
-
Adversarial review of policy gradient - blog
As usual, pick reinfoce_<framework_name>.ipynb for starters and then proceed with homework_<framework_name>.ipynb.