forked from tensorflow/models
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request tensorflow#5870 from ofirnachum/master
Add training and eval code for efficient-hrl
- Loading branch information
Showing
51 changed files
with
7,760 additions
and
171 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,62 @@ | ||
Code for performing Hierarchical RL based on | ||
Code for performing Hierarchical RL based on the following publications: | ||
|
||
"Data-Efficient Hierarchical Reinforcement Learning" by | ||
Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine | ||
(https://arxiv.org/abs/1805.08296). | ||
|
||
|
||
This library currently includes three of the environments used: | ||
Ant Maze, Ant Push, and Ant Fall. | ||
|
||
The training code is planned to be open-sourced at a later time. | ||
"Near-Optimal Representation Learning for Hierarchical Reinforcement Learning" | ||
by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine | ||
(https://arxiv.org/abs/1810.01257). | ||
|
||
|
||
Requirements: | ||
* TensorFlow (see http://www.tensorflow.org for how to install/upgrade) | ||
* Gin Config (see https://github.com/google/gin-config) | ||
* Tensorflow Agents (see https://github.com/tensorflow/agents) | ||
* OpenAI Gym (see http://gym.openai.com/docs, be sure to install MuJoCo as well) | ||
* NumPy (see http://www.numpy.org/) | ||
|
||
|
||
Quick Start: | ||
|
||
Run a random policy on AntMaze (or AntPush, AntFall): | ||
Run a training job based on the original HIRO paper on Ant Maze: | ||
|
||
``` | ||
python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite | ||
``` | ||
|
||
Run a continuous evaluation job for that experiment: | ||
|
||
``` | ||
python environments/__init__.py --env=AntMaze | ||
python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite | ||
``` | ||
|
||
To run the same experiment with online representation learning (the | ||
"Near-Optimal" paper), change `hiro_orig` to `hiro_repr`. | ||
You can also run with `hiro_xy` to run the same experiment with HIRO on only the | ||
xy coordinates of the agent. | ||
|
||
To run on other environments, change `ant_maze` to something else; e.g., | ||
`ant_push_multi`, `ant_fall_multi`, etc. See `context/configs/*` for other options. | ||
|
||
|
||
Basic Code Guide: | ||
|
||
The code for training resides in train.py. The code trains a lower-level policy | ||
(a UVF agent in the code) and a higher-level policy (a MetaAgent in the code) | ||
concurrently. The higher-level policy communicates goals to the lower-level | ||
policy. In the code, this is called a context. Not only does the lower-level | ||
policy act with respect to a context (a higher-level specified goal), but the | ||
higher-level policy also acts with respect to an environment-specified context | ||
(corresponding to the navigation target location associated with the task). | ||
Therefore, in `context/configs/*` you will find both specifications for task setup | ||
as well as goal configurations. Most remaining hyperparameters used for | ||
training/evaluation may be found in `configs/*`. | ||
|
||
NOTE: Not all the code corresponding to the "Near-Optimal" paper is included. | ||
Namely, changes to low-level policy training proposed in the paper (discounting | ||
and auxiliary rewards) are not implemented here. Performance should not change | ||
significantly. | ||
|
||
|
||
Maintained by Ofir Nachum (ofirnachum). |
Oops, something went wrong.