From 45fa9ca1539b41959b0d4849f11b2a78241fa3fb Mon Sep 17 00:00:00 2001
From: Dzmitry Bahdanau <dimabgv@gmail.com>
Date: Fri, 12 Oct 2018 12:17:34 -0400
Subject: [PATCH 1/2] edit README (#206)

---
 README.md      | 80 ++++++++++++++------------------------------------
 scripts/gui.py |  2 +-
 2 files changed, 23 insertions(+), 59 deletions(-)

diff --git a/README.md b/README.md
index 138ac6d5..3d6e5780 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
 
 [![CircleCI](https://circleci.com/gh/mila-udem/babyai.svg?style=svg&circle-token=ed2191e1bb0206a2f3f2e22f45f1369f7b8115a9)](https://circleci.com/gh/mila-udem/babyai)
 
-Prototype of a game where a reinforcement learning agent is trained through natural language instructions. This is a research project based at [Mila](https://mila.quebec/en/).
+A platform for simulating language learning with a human in the loop. This is a on-going research project based at [Mila](https://mila.quebec/en/).
 
 ## Installation
 
@@ -32,49 +32,40 @@ If you are using conda, you can create a `babyai` environment with all the depen
 conda env create -f environment.yaml
 ```
 
-Having done that, you can either add `babyai` and `gym-minigrid` in your `$PYTHONPATH` or install them in the development mode as suggested above.
+Having done that, install this repository in the conda environment using the command above.
 
 ## Structure of the Codebase
 
 In `babyai`:
-- The `levels` directory contains all the code relevant to the generation of levels and curriculums. Essentially, this implements the test task for the Baby AI Game. This is an importable module which people can use on its own to perform experiments.
-- The `agents` directory contains a default implementation of one or more agents to be evaluated on the baby AI game. This should also be importable as an independent module. Each agent will need to support methods to be provided teaching inputs using pointing and naming, as well as demonstrations.
-- The `multienv` directory contains an implementation of the algorithms described in [Matiisen et al., 2017](https://arxiv.org/abs/1707.00183) for automatic execution of curriculums.
-- The `utils` directory contains a bunch of useful functions that can be used when training Reinforcement Learning or Imitation Learning agents.
-- `model.py` is a script containing the network architectures used when training any type of agent.
+- `levels` contains the code for all levels
+- `bot.py` is a heuristic stack-based bot that can solve all levels
+- `imitation.py` is an imitation learning implementation
+- `rl` contains an implementation of the Proximal Policy Optimization (PPO) RL algorithm
+- `model.py` contains the neural network code
 
 In `scripts`:
-- `make_human_demos.py` is a helper script to easily make and save human demonstrations that can be helpful for Imitation Learning.
-- `train_il.py` is a script used to train an Imitation Learning agent on demonstrations, whether generated by *humans*, or by a Reinforcement Learning agent.
-- `train_rl.py` is a script used to train a Reinforcement Learning agent, using the aforementioned `model.py`
-- `make_agent_demos.py` takes as input a pre-trained Reinforcement Learning agent (or another type of agent), and generates demonstrations on new instances of the level. These can be used to train an Imitation Learning Agent for example.
-- `evaluate.py`, `evaluate_all_demos.py`, and `evaluate_all_models.py` are used to obtain basic statistics on the reward an agent obtains, and the number of steps necessary to complete missions within a level.
-- `enjoy.py` helps visualize demonstrations or the behavior of a pre-trained RL agent.
-
-The `gui.py` script implements a template of a user interface for interactive human teaching. The version found in the master branch allows you to control the agent manually with the arrow keys, but it is not currently connected to any model or teaching code. Currently, most experiments are done offline, without a user interface.
+- use `train_il.py` to train an agent with imitation learning, using demonstrations from the bot, from another agent or even provided by a human
+- use `train_rl.py` to train an agent with reinforcement learning
+- use `make_agent_demos.py` to generate demonstrations with the bot or with another agent
+- use `make_human_demos.py` to make and save human demonstrations 
+- use `evaluate.py` to evaluate a trained agent
+- use `enjoy.py` to visualze an agent's behavior
+- use `gui.py` or `test_mission_gen.py` to see example missions from BabyAI levels
 
 ## Usage
 
 To run the interactive GUI application:
 
 ```
-./gui.py
+scripts/gui.py
 ```
 
 The level being run can be selected with the `--env-name` option, eg:
 
 ```
-./gui.py --env-name BabyAI-UnlockPickup-v0
+scripts/gui.py --env-name BabyAI-UnlockPickup-v0
 ```
 
-To see the available levels, please read [this](#the-levels).
-
-### Usage at Mila
-
-If you connect to the lab machines by ssh-ing, make sure to use `ssh -X` in order to see the game window. This will work even for a chain of ssh connections, as long as you use `ssh -X` at all intermediate steps. If you use screen, set `$DISPLAY` variable manually inside each of your screen terminals. You can find the right value for `$DISPLAY` by detaching from you screen first (`Ctrl+A+D`) and then running `echo $DISPLAY`.
-
-The code does not work in conda, install everything with `pip install --user`.
-
 ### The Levels
 
 Documentation for the ICLR19 levels can be found in
@@ -84,40 +75,13 @@ There are also older levels documented in
 
 ### Troubleshooting
 
-If you run into error messages relating to OpenAI gym or PyQT, it may be that the version of those libraries that you have installed is incompatible. You can try upgrading specific libraries with pip3, eg: `pip3 install --upgrade gym`. If the problem persists, please [open an issue](https://github.com/maximecb/baby-ai-game/issues) on this repository and paste a *complete* error message, along with some information about your platform (are you running Windows, Mac, Linux? Are you running this on a Mila machine?).
-
-## About this Project
-
-The Baby AI Game is a game in which an agent existing in a simulated world
-will be trained to complete task through reinforcement learning as well
-as interactions from one or more human teachers. These interactions will take
-the form of natural language, and possibly other feedback, such as human
-teachers manually giving rewards to the agent, or pointing towards
-specific objects in the game using the mouse.
-
-Two of the main goals of the project are to explore ways in which deep learning can take
-inspiration from human learning (ie: how human babies learn), and to research AI learning
-with humans in the loop. In particular, language learning,
-as well as teaching agents to complete actions spanning many (eg: hundreds)
-of time steps, or macro-actions composed of multiple micro-actions, are
-still open research problems.
-
-Some possible approaches to be explored in this project include meta-learning
-and curriculum learning, the use of intrinsic motivation (curiosity), and
-the use of pretraining to give agents a small core of built-in knowledge to
-allow them to learn from human agents. With respect to built-in knowledge,
-Yoshua Bengio believes that the ability for agents to understand pointing
-gestures in combination with language may be key.
-
-You can find here a presentation of the project: [Baby AI Summary](https://docs.google.com/document/d/1WXY0HLHizxuZl0GMGY0j3FEqLaK1oX-66v-4PyZIvdU)
-
-A work-in-progress review of related work can be found [here](https://www.overleaf.com/13480997qqsxybgstxhg#/52042269/)
+If you run into error messages relating to OpenAI gym or PyQT, it may be that the version of those libraries that you have installed is incompatible. You can try upgrading specific libraries with pip3, eg: `pip3 install --upgrade gym`. If the problem persists, please [open an issue](https://github.com/mila-udem/babyai/issues) on this repository and paste a *complete* error message, along with some information about your platform (are you running Windows, Mac, Linux? Are you running this on a Mila machine?).
 
 ## Instructions for Committers
 
-To contribute to this project, you should first create your own fork, and remember to periodically [sync changes from this repository](https://stackoverflow.com/questions/7244321/how-do-i-update-a-github-forked-repository). You can then create [pull requests](https://yangsu.github.io/pull-request-tutorial/) for modifications you have made. Your changes will be tested and reviewed before they are merged into this repository. If you are not familiar with forks and pull requests, I recommend doing a Google or YouTube search to find many useful tutorials on the issue. Knowing how to use git and GitHub effectively are valuable skills for any programmer.
+To contribute to this project, you should first create your own fork, and remember to periodically [sync changes from this repository](https://stackoverflow.com/questions/7244321/how-do-i-update-a-github-forked-repository). You can then create [pull requests](https://yangsu.github.io/pull-request-tutorial/) for modifications you have made. Your changes will be tested and reviewed before they are merged into this repository. If you are not familiar with forks and pull requests, we recommend doing a Google or YouTube search to find many useful tutorials on the topic. 
+
+## About this Project
 
-If you have found a bug, or would like to request a change or improvement
-to the grid world environment or user interface, please
-[open an issue](https://github.com/maximecb/baby-ai-game/issues)
-on this repository. For bug reports, please paste complete error messages and describe your system configuration (are you running on Mac, Linux?).
+BabyAI is an open-ended grounded language acquisition effort at [Mila](https://mila.quebec/en/). The current BabyAI platform was designed to study data-effiency of existing methods under the assumption that a human provides all teaching signals
+(i.e. demonstrations, rewards, etc.). For more information, see the paper (COMING SOON).
diff --git a/scripts/gui.py b/scripts/gui.py
index e9a2261b..e88ed9ce 100755
--- a/scripts/gui.py
+++ b/scripts/gui.py
@@ -371,7 +371,7 @@ def main(argv):
     parser.add_option(
         "--env-name",
         help="gym environment to load",
-        default='MiniGrid-MultiRoom-N6-v0'
+        default='BabyAI-BossLevel-v0'
     )
     (options, args) = parser.parse_args()
 

From 63e83aae3c7b2fb59ba5c0af41d62dad805a5790 Mon Sep 17 00:00:00 2001
From: Dzmitry Bahdanau <dimabgv@gmail.com>
Date: Fri, 12 Oct 2018 13:16:41 -0400
Subject: [PATCH 2/2] edit README (#210)

---
 README.md | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3d6e5780..658307a2 100644
--- a/README.md
+++ b/README.md
@@ -48,13 +48,14 @@ In `scripts`:
 - use `train_rl.py` to train an agent with reinforcement learning
 - use `make_agent_demos.py` to generate demonstrations with the bot or with another agent
 - use `make_human_demos.py` to make and save human demonstrations 
+- use `train_intelligent_expert.py` to train an agent with an interactive imitation learning algorithm that incrementally grows the training set by adding demonstrations for the missions that the agent currently fails
 - use `evaluate.py` to evaluate a trained agent
 - use `enjoy.py` to visualze an agent's behavior
 - use `gui.py` or `test_mission_gen.py` to see example missions from BabyAI levels
 
 ## Usage
 
-To run the interactive GUI application:
+To run the interactive GUI application that illustrates the platform:
 
 ```
 scripts/gui.py
@@ -66,6 +67,45 @@ The level being run can be selected with the `--env-name` option, eg:
 scripts/gui.py --env-name BabyAI-UnlockPickup-v0
 ```
 
+### Training
+
+To train an RL agent run e.g.
+
+```
+scripts/train_rl.py --env BabyAI-GoToLocal-v0
+```
+
+Folders `logs/` and `models/` will be created in the current directory. The default name
+for the model is chosen based on the level name, the current time and the other settings (e.g. 
+`BabyAI-GoToLocal-v0_ppo_expert_filmcnn_gru_mem_seed1_18-10-12-12-45-02`). You can also choose the model
+name by setting `--model`. After 5 hours of training you should be getting a success rate of 97-99\%. 
+A machine readable log can be found in `logs/<MODEL>/log.csv`, a human readable in `logs/<MODEL>/log.log`.
+
+To train an agent with imitation learning first make sure that you have your demonstrations in
+`demos/<DEMOS>`. Then run e.g.
+
+```
+scripts/train_il.py --env BabyAI-GoToLocal-v0 --demos <DEMOS>
+```
+
+In the example above we run scripts from the root of the repository, but if you have installed BabyAI as
+described above, you can also run all scripts with commands like `<PATH-TO-BABYAI-REPO>/scripts/train_il.py`.
+
+### Evaluation
+
+In the same directory where you trained your model run e.g.
+
+```
+scripts/evaluate.py --env BabyAI-GoToLocal-v0 --model <MODEL>
+```
+
+to evaluate the performance of your model named `<MODEL>` on 1000 episodes. If you want to see
+your agent performing, run 
+
+```
+scripts/enjoy.py --env BabyAI-GoToLocal-v0 --model <MODEL>
+```
+
 ### The Levels
 
 Documentation for the ICLR19 levels can be found in