GitHub - learcane/Value-Iteration: Visualizations of Reinforcement Learning concepts including Value Iteration and Q-Learning

State Value Iteration For FrozenLake8x8-v0

State Value Iteration assumes that the MDP is known and then in sweeps it will update the value for each state given the possible: actions, future states, future rewards, and state transition probabilities. The variable gamma will affect how much long term rewards are considered when updating state values.

Here are some images showing state values after convergence, I modified the value of the final state [8,8] to be the max of all value states so that the graph looks nicer. All terminal states have zero value.

See notebook for full code.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
FrozenIce.ipynb		FrozenIce.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

State Value Iteration For FrozenLake8x8-v0

Gamma 1

Gamma 0.999

Gamma 0.95

About

Releases

Packages

Languages

learcane/Value-Iteration

Folders and files

Latest commit

History

Repository files navigation

State Value Iteration For FrozenLake8x8-v0

Gamma 1

Gamma 0.999

Gamma 0.95

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages