Practical_RL/week6_policy_based at master · learcane/Practical_RL

README.md

Actually proving the policy gradient for discounted rewards - article
On variance of policy gradient and optimal baselines: article, another article
Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article
Generalizing log-derivative trick - url
Combining policy gradient and q-learning - arxiv
Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf
Adversarial review of policy gradient - blog

As usual, pick reinfoce_<framework_name>.ipynb for starters and then proceed with homework_<framework_name>.ipynb.