Skip to content

Commit

Permalink
Merge branch 'master' into il-multienv
Browse files Browse the repository at this point in the history
  • Loading branch information
saleml committed Oct 12, 2018
2 parents 693d14f + 63e83aa commit 7f64297
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 56 deletions.
114 changes: 59 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![CircleCI](https://circleci.com/gh/mila-udem/babyai.svg?style=svg&circle-token=ed2191e1bb0206a2f3f2e22f45f1369f7b8115a9)](https://circleci.com/gh/mila-udem/babyai)

Prototype of a game where a reinforcement learning agent is trained through natural language instructions. This is a research project based at [Mila](https://mila.quebec/en/).
A platform for simulating language learning with a human in the loop. This is a on-going research project based at [Mila](https://mila.quebec/en/).

## Installation

Expand Down Expand Up @@ -32,92 +32,96 @@ If you are using conda, you can create a `babyai` environment with all the depen
conda env create -f environment.yaml
```

Having done that, you can either add `babyai` and `gym-minigrid` in your `$PYTHONPATH` or install them in the development mode as suggested above.
Having done that, install this repository in the conda environment using the command above.

## Structure of the Codebase

In `babyai`:
- The `levels` directory contains all the code relevant to the generation of levels and curriculums. Essentially, this implements the test task for the Baby AI Game. This is an importable module which people can use on its own to perform experiments.
- The `agents` directory contains a default implementation of one or more agents to be evaluated on the baby AI game. This should also be importable as an independent module. Each agent will need to support methods to be provided teaching inputs using pointing and naming, as well as demonstrations.
- The `multienv` directory contains an implementation of the algorithms described in [Matiisen et al., 2017](https://arxiv.org/abs/1707.00183) for automatic execution of curriculums.
- The `utils` directory contains a bunch of useful functions that can be used when training Reinforcement Learning or Imitation Learning agents.
- `model.py` is a script containing the network architectures used when training any type of agent.
- `levels` contains the code for all levels
- `bot.py` is a heuristic stack-based bot that can solve all levels
- `imitation.py` is an imitation learning implementation
- `rl` contains an implementation of the Proximal Policy Optimization (PPO) RL algorithm
- `model.py` contains the neural network code

In `scripts`:
- `make_human_demos.py` is a helper script to easily make and save human demonstrations that can be helpful for Imitation Learning.
- `train_il.py` is a script used to train an Imitation Learning agent on demonstrations, whether generated by *humans*, or by a Reinforcement Learning agent.
- `train_rl.py` is a script used to train a Reinforcement Learning agent, using the aforementioned `model.py`
- `make_agent_demos.py` takes as input a pre-trained Reinforcement Learning agent (or another type of agent), and generates demonstrations on new instances of the level. These can be used to train an Imitation Learning Agent for example.
- `evaluate.py`, `evaluate_all_demos.py`, and `evaluate_all_models.py` are used to obtain basic statistics on the reward an agent obtains, and the number of steps necessary to complete missions within a level.
- `enjoy.py` helps visualize demonstrations or the behavior of a pre-trained RL agent.

The `gui.py` script implements a template of a user interface for interactive human teaching. The version found in the master branch allows you to control the agent manually with the arrow keys, but it is not currently connected to any model or teaching code. Currently, most experiments are done offline, without a user interface.
- use `train_il.py` to train an agent with imitation learning, using demonstrations from the bot, from another agent or even provided by a human
- use `train_rl.py` to train an agent with reinforcement learning
- use `make_agent_demos.py` to generate demonstrations with the bot or with another agent
- use `make_human_demos.py` to make and save human demonstrations
- use `train_intelligent_expert.py` to train an agent with an interactive imitation learning algorithm that incrementally grows the training set by adding demonstrations for the missions that the agent currently fails
- use `evaluate.py` to evaluate a trained agent
- use `enjoy.py` to visualze an agent's behavior
- use `gui.py` or `test_mission_gen.py` to see example missions from BabyAI levels

## Usage

To run the interactive GUI application:
To run the interactive GUI application that illustrates the platform:

```
./gui.py
scripts/gui.py
```

The level being run can be selected with the `--env-name` option, eg:

```
./gui.py --env-name BabyAI-UnlockPickup-v0
scripts/gui.py --env-name BabyAI-UnlockPickup-v0
```

To see the available levels, please read [this](#the-levels).
### Training

### Usage at Mila
To train an RL agent run e.g.

If you connect to the lab machines by ssh-ing, make sure to use `ssh -X` in order to see the game window. This will work even for a chain of ssh connections, as long as you use `ssh -X` at all intermediate steps. If you use screen, set `$DISPLAY` variable manually inside each of your screen terminals. You can find the right value for `$DISPLAY` by detaching from you screen first (`Ctrl+A+D`) and then running `echo $DISPLAY`.
```
scripts/train_rl.py --env BabyAI-GoToLocal-v0
```

The code does not work in conda, install everything with `pip install --user`.
Folders `logs/` and `models/` will be created in the current directory. The default name
for the model is chosen based on the level name, the current time and the other settings (e.g.
`BabyAI-GoToLocal-v0_ppo_expert_filmcnn_gru_mem_seed1_18-10-12-12-45-02`). You can also choose the model
name by setting `--model`. After 5 hours of training you should be getting a success rate of 97-99\%.
A machine readable log can be found in `logs/<MODEL>/log.csv`, a human readable in `logs/<MODEL>/log.log`.

### The Levels
To train an agent with imitation learning first make sure that you have your demonstrations in
`demos/<DEMOS>`. Then run e.g.

Documentation for the ICLR19 levels can be found in
[docs/iclr19_levels.md](docs/iclr19_levels.md).
There are also older levels documented in
[docs/bonus_levels.md](docs/bonus_levels.md).
```
scripts/train_il.py --env BabyAI-GoToLocal-v0 --demos <DEMOS>
```

### Troubleshooting
In the example above we run scripts from the root of the repository, but if you have installed BabyAI as
described above, you can also run all scripts with commands like `<PATH-TO-BABYAI-REPO>/scripts/train_il.py`.

If you run into error messages relating to OpenAI gym or PyQT, it may be that the version of those libraries that you have installed is incompatible. You can try upgrading specific libraries with pip3, eg: `pip3 install --upgrade gym`. If the problem persists, please [open an issue](https://github.com/maximecb/baby-ai-game/issues) on this repository and paste a *complete* error message, along with some information about your platform (are you running Windows, Mac, Linux? Are you running this on a Mila machine?).
### Evaluation

## About this Project
In the same directory where you trained your model run e.g.

```
scripts/evaluate.py --env BabyAI-GoToLocal-v0 --model <MODEL>
```

The Baby AI Game is a game in which an agent existing in a simulated world
will be trained to complete task through reinforcement learning as well
as interactions from one or more human teachers. These interactions will take
the form of natural language, and possibly other feedback, such as human
teachers manually giving rewards to the agent, or pointing towards
specific objects in the game using the mouse.
to evaluate the performance of your model named `<MODEL>` on 1000 episodes. If you want to see
your agent performing, run

Two of the main goals of the project are to explore ways in which deep learning can take
inspiration from human learning (ie: how human babies learn), and to research AI learning
with humans in the loop. In particular, language learning,
as well as teaching agents to complete actions spanning many (eg: hundreds)
of time steps, or macro-actions composed of multiple micro-actions, are
still open research problems.
```
scripts/enjoy.py --env BabyAI-GoToLocal-v0 --model <MODEL>
```

Some possible approaches to be explored in this project include meta-learning
and curriculum learning, the use of intrinsic motivation (curiosity), and
the use of pretraining to give agents a small core of built-in knowledge to
allow them to learn from human agents. With respect to built-in knowledge,
Yoshua Bengio believes that the ability for agents to understand pointing
gestures in combination with language may be key.
### The Levels

You can find here a presentation of the project: [Baby AI Summary](https://docs.google.com/document/d/1WXY0HLHizxuZl0GMGY0j3FEqLaK1oX-66v-4PyZIvdU)
Documentation for the ICLR19 levels can be found in
[docs/iclr19_levels.md](docs/iclr19_levels.md).
There are also older levels documented in
[docs/bonus_levels.md](docs/bonus_levels.md).

### Troubleshooting

A work-in-progress review of related work can be found [here](https://www.overleaf.com/13480997qqsxybgstxhg#/52042269/)
If you run into error messages relating to OpenAI gym or PyQT, it may be that the version of those libraries that you have installed is incompatible. You can try upgrading specific libraries with pip3, eg: `pip3 install --upgrade gym`. If the problem persists, please [open an issue](https://github.com/mila-udem/babyai/issues) on this repository and paste a *complete* error message, along with some information about your platform (are you running Windows, Mac, Linux? Are you running this on a Mila machine?).

## Instructions for Committers

To contribute to this project, you should first create your own fork, and remember to periodically [sync changes from this repository](https://stackoverflow.com/questions/7244321/how-do-i-update-a-github-forked-repository). You can then create [pull requests](https://yangsu.github.io/pull-request-tutorial/) for modifications you have made. Your changes will be tested and reviewed before they are merged into this repository. If you are not familiar with forks and pull requests, I recommend doing a Google or YouTube search to find many useful tutorials on the issue. Knowing how to use git and GitHub effectively are valuable skills for any programmer.
To contribute to this project, you should first create your own fork, and remember to periodically [sync changes from this repository](https://stackoverflow.com/questions/7244321/how-do-i-update-a-github-forked-repository). You can then create [pull requests](https://yangsu.github.io/pull-request-tutorial/) for modifications you have made. Your changes will be tested and reviewed before they are merged into this repository. If you are not familiar with forks and pull requests, we recommend doing a Google or YouTube search to find many useful tutorials on the topic.

## About this Project

If you have found a bug, or would like to request a change or improvement
to the grid world environment or user interface, please
[open an issue](https://github.com/maximecb/baby-ai-game/issues)
on this repository. For bug reports, please paste complete error messages and describe your system configuration (are you running on Mac, Linux?).
BabyAI is an open-ended grounded language acquisition effort at [Mila](https://mila.quebec/en/). The current BabyAI platform was designed to study data-effiency of existing methods under the assumption that a human provides all teaching signals
(i.e. demonstrations, rewards, etc.). For more information, see the paper (COMING SOON).
2 changes: 1 addition & 1 deletion scripts/gui.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ def main(argv):
parser.add_option(
"--env-name",
help="gym environment to load",
default='MiniGrid-MultiRoom-N6-v0'
default='BabyAI-BossLevel-v0'
)
(options, args) = parser.parse_args()

Expand Down

0 comments on commit 7f64297

Please sign in to comment.