In this project, an Advantage Actor Critic (A2C) network is trained to control automatic arms that will try to touch and follow the moving balls.
The environment for this project involves controlling a double-jointed arm to reach target locations.
state is continuous, the state vector has 33 dimensions, corresponding to position, rotation, velocity, and angular velocities of the arm.
Each action is a vector with 4 numbers, corresponding to torque applicable to two joints. Every entry in the action vector must be a number between -1 and 1
.
A reward of +0.1
is provided for each step that the agent's hand is in the goal location.
maintain the agent's hand at the target location for as many time steps as possible.
An average score of +30
over 100
consecutive episodes, and over all agents.
* The version with 20 identical copies of the agent sharing the same experience is used in this experiment.
*Please prepare a python3 virtual environment if necessary.
git clone https://github.com/qiaochen/A2C.git
cd install_requirements
pip install .
For this project, I use the environment form Udacity. The links to modules at different system environments are copied here for convenience:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here I conducted my experiments in Ubuntu 16.04, so I picked the 1st option. Then, extract and place the Reacher_Linux folder within the project root. The project folder structure now looks like this (Program generated .png and model files are excluded):
Project Root
|-install_requirements (Folder)
|-README.md
|-Report.md
|-agent.py
|-models.py
|-train.py
|-test.py
|-utils.py
|-Reacher_Linux (Folder)
|-Reacher.x86_64
|-Reacher.x86
|-Reacher_Data (Folder)
python thain.py
After training, the following files will be generated and placed in the project root folder:
- best_model.checkpoint (the trained model)
- training_100avgscore_plot.png (a plot of avg. scores during training)
- training_score_plot.png (a plot of per-episode scores during training)
- unity-environment.log (log file created by Unity)
python test.py
The testing performance will be summarized in the generated plot within project root:
- test_score_plot.png