Name	Name	Last commit message	Last commit date
parent directory ..
compwob	compwob
gwob	gwob
README.md	README.md
integrate_miniwob.py	integrate_miniwob.py

Environment Generation for Zero-Shot Compositional Reinforcement Learning

Overview

This repository includes the implementation of Environment Generation for Zero-Shot Compositional Reinforcement Learning by Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi,Manoj Tiwari, Honglak Lee, and Aleksandra Faust. NeurIPS'21.

(please see compwob directory if you are interested in the code for Language Model Agents Suffer from Compositional Decision Making)

Getting Started

Download this repo:

svn export https://github.com/google-research/google-research/trunk/compositional_rl

Install Dependencies

Install Bootstrap

Download bootstrap files:

mkdir gwob/bootstrap/ && cd gwob/bootstrap && wget https://github.com/twbs/bootstrap/releases/download/v4.3.1/bootstrap-4.3.1-dist.zip

Unzip and extract:

unzip bootstrap-4.3.1-dist.zip && cp bootstrap-4.3.1-dist/css/bootstrap.min.css . && cp bootstrap-4.3.1-dist/js/bootstrap.min.js . && rm -r bootstrap-4.3.1-dist* && cd ../../

Install MiniWoB

Clone the MiniWoB project (from legacy branch):

git clone -b legacy https://github.com/Farama-Foundation/miniwob-plusplus.git gwob/miniwob-plusplus

Checkout the version that we used in our project:

cd gwob/miniwob-plusplus && git checkout 833a477a8fbfbd2497e95fee019f76df2b9bd75e

Convert all python files from Python 2 to Python 3:

pip install 2to3 && cd ../../ && 2to3 gwob/miniwob-plusplus/python/miniwob -w

Integrate MiniWoB by making necessary changes:

python3 integrate_miniwob.py

Install MiniWoB:

pip install gwob/miniwob-plusplus/python/

Install the ChromeDriver with the version that is matching your Chrome browser:

export PATH=$PATH:/path/to/chromedriver

Install gMiniWoB

Install gMiniWoB:

pip install gwob/

Examples

Generating random websites in gMiniWoB

Open file:///path/to/compositional_rl/gwob/gminiwob/sample_random_website.html in a browser and click "START".
Each time the "START" button is clicked, this will create a random gMiniWoB website using a subset of primitives available in gMiniWoB.

Running a rule-based policy with a fixed test environment

Run python3 gwob/examples/web_environment_example.py --data_dep_path='/path/to/compositional_rl/gwob/ to run a rule-based policy for a simulated shopping website. If you get any errors related to non-headless browsing, make sure to pass --run_headless_mode=True.

Environment design and Q-value network

The following is a simple tutorial for randomly designing an environment and using an LSTM-based DQN to generate logits and values.

import gin
import numpy as np

from CoDE import test_websites
from CoDE import utils
from CoDE import vocabulary_node
from CoDE import web_environment
from CoDE import web_primitives
from CoDE import q_networks

gin.parse_config_files_and_bindings(["/path/to/compositional_rl/gwob/configs/envdesign.gin"], None)

# Create an empty environment.
env = web_environment.GMiniWoBWebEnvironment(
  base_url="file:///path/to/compositional_rl/gwob/",
  global_vocabulary=vocabulary_node.LockedVocabulary())

# Create a q network.
q_net = q_networks.DQNWebLSTM(vocab_size=env.local_vocab.max_vocabulary_size, return_state_value=True)

# Sample a new design of the form {'number_of_pages': Integer, 'action': List[Integer], 'action_page': List[Integer]}.
# `action` denotes primitive indices and `action_page` denotes their corresponding page indices.
# Each item in the `action_page` should be less than `number_of_pages`.
# For this tutorial, we will randomly sample a design.
number_of_pages = np.random.randint(4) + 1
design =  {'number_of_pages': number_of_pages,
            'action': np.random.choice(np.arange(len(web_primitives.CONCEPTS)), 5),
            'action_page': np.random.choice(np.arange(number_of_pages), 5)}

# Design the actual environment.
env.design_environment(
    design, auto_num_pages=True)

# Reset the environment.
state = env.reset()

# Add batch dimension.
state = {key: np.expand_dims(tensor, axis=0) for key, tensor in state.items()}

# Get flattened logits and values.
logits, values = q_net(state)

# Get greedy action.
action = np.argmax(logits)

# Run the action.
new_state, reward, done, info = env.step(action)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compositional_rl

compositional_rl

README.md

Environment Generation for Zero-Shot Compositional Reinforcement Learning

Overview

Getting Started

Install Dependencies

Install Bootstrap

Install MiniWoB

Install gMiniWoB

Examples

Generating random websites in gMiniWoB

Running a rule-based policy with a fixed test environment

Environment design and Q-value network

Files

compositional_rl

Directory actions

More options

Directory actions

More options

Latest commit

History

compositional_rl

Folders and files

parent directory

README.md

Environment Generation for Zero-Shot Compositional Reinforcement Learning

Overview

Getting Started

Install Dependencies

Install Bootstrap

Install MiniWoB

Install gMiniWoB

Examples

Generating random websites in gMiniWoB

Running a rule-based policy with a fixed test environment

Environment design and Q-value network