Skip to content

Visualizations of Reinforcement Learning concepts including Value Iteration and Q-Learning

Notifications You must be signed in to change notification settings

learcane/Value-Iteration

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

State Value Iteration For FrozenLake8x8-v0

State Value Iteration assumes that the MDP is known and then in sweeps it will update the value for each state given the possible: actions, future states, future rewards, and state transition probabilities. The variable gamma will affect how much long term rewards are considered when updating state values.

Here are some images showing state values after convergence, I modified the value of the final state [8,8] to be the max of all value states so that the graph looks nicer. All terminal states have zero value.

See notebook for full code.

Gamma 1

Gamma 0.999

Gamma 0.95

About

Visualizations of Reinforcement Learning concepts including Value Iteration and Q-Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%