Skip to content

Commit

Permalink
Merge pull request tensorflow#5870 from ofirnachum/master
Browse files Browse the repository at this point in the history
Add training and eval code for efficient-hrl
  • Loading branch information
nealwu authored Dec 6, 2018
2 parents 2c18130 + 052361d commit c9f03bf
Show file tree
Hide file tree
Showing 51 changed files with 7,760 additions and 171 deletions.
50 changes: 42 additions & 8 deletions research/efficient-hrl/README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,62 @@
Code for performing Hierarchical RL based on
Code for performing Hierarchical RL based on the following publications:

"Data-Efficient Hierarchical Reinforcement Learning" by
Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine
(https://arxiv.org/abs/1805.08296).


This library currently includes three of the environments used:
Ant Maze, Ant Push, and Ant Fall.

The training code is planned to be open-sourced at a later time.
"Near-Optimal Representation Learning for Hierarchical Reinforcement Learning"
by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine
(https://arxiv.org/abs/1810.01257).


Requirements:
* TensorFlow (see http://www.tensorflow.org for how to install/upgrade)
* Gin Config (see https://github.com/google/gin-config)
* Tensorflow Agents (see https://github.com/tensorflow/agents)
* OpenAI Gym (see http://gym.openai.com/docs, be sure to install MuJoCo as well)
* NumPy (see http://www.numpy.org/)


Quick Start:

Run a random policy on AntMaze (or AntPush, AntFall):
Run a training job based on the original HIRO paper on Ant Maze:

```
python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite
```

Run a continuous evaluation job for that experiment:

```
python environments/__init__.py --env=AntMaze
python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite
```

To run the same experiment with online representation learning (the
"Near-Optimal" paper), change `hiro_orig` to `hiro_repr`.
You can also run with `hiro_xy` to run the same experiment with HIRO on only the
xy coordinates of the agent.

To run on other environments, change `ant_maze` to something else; e.g.,
`ant_push_multi`, `ant_fall_multi`, etc. See `context/configs/*` for other options.


Basic Code Guide:

The code for training resides in train.py. The code trains a lower-level policy
(a UVF agent in the code) and a higher-level policy (a MetaAgent in the code)
concurrently. The higher-level policy communicates goals to the lower-level
policy. In the code, this is called a context. Not only does the lower-level
policy act with respect to a context (a higher-level specified goal), but the
higher-level policy also acts with respect to an environment-specified context
(corresponding to the navigation target location associated with the task).
Therefore, in `context/configs/*` you will find both specifications for task setup
as well as goal configurations. Most remaining hyperparameters used for
training/evaluation may be found in `configs/*`.

NOTE: Not all the code corresponding to the "Near-Optimal" paper is included.
Namely, changes to low-level policy training proposed in the paper (discounting
and auxiliary rewards) are not implemented here. Performance should not change
significantly.


Maintained by Ofir Nachum (ofirnachum).
Loading

0 comments on commit c9f03bf

Please sign in to comment.