Acrobot-v1

This repo trains a policy in the Acrobot-v1 environment using a minimal implementation of the REINFORCE algorithm.

Environment Description

State Space: 6 continuous values. Shape = (6,) Action Space: 1 of 3 possible values: (-1, 0, 1). Shape = (1,)

Objective: Swing the two-link, two-joint object up to the horizontal line.

Policy Model Description

The policy model maps states to actions. I used a feed-forward neural network (FFNN) with one hidden layer of 32 units and ReLU activations followed by a softmax output layer of 3 units.

Results

Best average score over 100 consecutive episodes: -92.95

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__pycache__		__pycache__
README.md		README.md
checkpoint.pth		checkpoint.pth
dqn.py		dqn.py
plot.py		plot.py
policy.py		policy.py
render.py		render.py
train.py		train.py
video.meta.json		video.meta.json
video.mp4		video.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acrobot-v1

Environment Description

Policy Model Description

Results

Video of Trained Policy

About

Releases

Packages

Languages

kevin866/acrobot-v1

Folders and files

Latest commit

History

Repository files navigation

Acrobot-v1

Environment Description

Policy Model Description

Results

Video of Trained Policy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages