Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
environment		environment
examples		examples
mbpo		mbpo
softlearning		softlearning
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
setup.py		setup.py

Repository files navigation

Model-Based Policy Optimization

Code to reproduce the experiments in When to Trust Your Model: Model-Based Policy Optimization.

Installation

Install MuJoCo 1.50 at ~/.mujoco/mjpro150 and copy your license key to ~/.mujoco/mjkey.txt
Clone mbpo

git clone --recursive https://github.com/jannerm/mbpo.git

Create a conda environment and install mbpo

cd mbpo
conda env create -f environment/gpu-env.yml
conda activate mbpo
pip install -e viskit
pip install -e .

Usage

Configuration files can be found in examples/config/.

mbpo run_local examples.development --config=examples.config.halfcheetah.0 \
	--checkpoint-frequency=1000 --gpus=1 --trial-gpus=1

Currently only running locally is supported.

New environments

To run on a different environment, you can modify the provided template. You will also need to provide the termination function for the environment in mbpo/static. If you name the file the lowercase version of the environment name, it will be found automatically. See hopper.py for an example.

Logging

This codebase contains viskit as a submodule. You can view saved runs with:

viskit ~/ray_mbpo --port 6008

assuming you used the default log_dir.

Hyperparameters

The rollout length schedule is defined by a length-4 list in a config file. The format is [start_epoch, end_epoch, start_length, end_length], so the following:

'rollout_schedule': [20, 100, 1, 5]

corresponds to a model rollout length linearly increasing from 1 to 5 over epochs 20 to 100.

If you want to speed up training in terms of wall clock time (but possibly make the runs less sample-efficient), you can set a timeout for model training (max_model_t, in seconds) or train the model less frequently (every model_train_freq steps).

Reference

If you find this code useful in an academic setting, please cite:

@article{janner2019mbpo,
  author = {Michael Janner and Justin Fu and Marvin Zhang and Sergey Levine},
  title = {When to Trust Your Model: Model-Based Policy Optimization},
  journal = {arXiv preprint arXiv:1906.08253},
  year = {2019}
}

Acknowledgments

The underlying soft actor-critic implementation in MBPO comes from Tuomas Haarnoja and Kristian Hartikainen's softlearning codebase. The modeling code is a slightly modified version of Kurtland Chua's PETS implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model-Based Policy Optimization

Installation

Usage

New environments

Logging

Hyperparameters

Reference

Acknowledgments

About

Releases

Packages

Languages

License

ColinQiyangLi/mbpo

Folders and files

Latest commit

History

Repository files navigation

Model-Based Policy Optimization

Installation

Usage

New environments

Logging

Hyperparameters

Reference

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages