Skip to content

Latest commit

 

History

History
 
 

week6_policy_based

Materials

More materials

  • Actually proving the policy gradient for discounted rewards - article

  • On variance of policy gradient and optimal baselines: article, another article

  • Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article

  • Generalizing log-derivative trick - url

  • Combining policy gradient and q-learning - arxiv

  • Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf

  • Adversarial review of policy gradient - blog

Homework

As usual, pick reinfoce_<framework_name>.ipynb for starters and then proceed with homework_<framework_name>.ipynb.