Delay-Aware Model-Based Reinforcement Learning

Evaluate custom gym environment

docker build . -t master-dcac:0.1
docker run master-dcac:0.1

Old description

Abstract Action delay prevalently exists in real world systems and is one of the key reasons leading to degraded performance of reinforcement learning. In this paper, we introduce a formal definition of delayed Markov Decision Process and prove it can be transformed into standard MDP with augmented states using Markov reward process. We then develop a delay-aware model-based reinforcement learning framework to directly incorporate the multi-step delay into the learned system models without learning effort. Experiments are conducted on the Gym and MuJoCo platforms. Results show that compared with off-policy model-free reinforcement learning methods, the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with variant durations of delay.

Installation

This code-base is based on PETS. run pip install -r requirements.txt. to install the python dependency. The current environments are simulated with MuJoCo 1.31. Please follow the installation procedures of MuJoCo + OpenAI gym, if the default pip installation fails.

Run the code!

Below is an example to reproduce the results.

DATS for pendulum with action delay

python mbexp.py -logdir ./log/DATS \
    -env gym_pendulum \
    -o exp_cfg.exp_cfg.ntrain_iters 200 \
    -o exp_cfg.sim_cfg.delay_hor 10\
    -o ctrl_cfg.prop_cfg.delay_step 10\
    -ca opt-type CEM \
    -ca model-type PE \
    -ca prop-type E

Changing Hyper-parameters

This repo is based on the PETS repo. And therefore we use the same hyper-parameters / arguments system.

Environment Arguments

The benchmark environment is based on MBBL.

python scripts/mbexp.py
    -env    (required) The name of the environment. Select from
            [reacher, pusher, halfcheetah, gym_ant, gym_cartpole, gym_fswimmer, ...].

Please look at ./dmbrl/config for more environments.

Control Arguments

All the old arguments are kept the same as they were in PETS. We refer the original code repo for the old arguments.

To set action delay step, follow the below example scripts for DATS-pendulum:

python mbexp.py -logdir ./log/DATS \
    -env gym_pendulum \
    -o exp_cfg.exp_cfg.ntrain_iters 200 \
    -o exp_cfg.sim_cfg.delay_hor 10\
    -o ctrl_cfg.prop_cfg.delay_step 10\
    -ca opt-type CEM \
    -ca model-type PE \
    -ca prop-type E

Results Logger

Results will be saved in <logdir>/<date+time of experiment start>/logs.mat. The logging file generated during the training can be observed in <logdir>/*/*.log/logger.log.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.devcontainer		.devcontainer
dmbrl		dmbrl
img		img
mbbl		mbbl
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
README.md		README.md
mbexp.py		mbexp.py
requirements.txt		requirements.txt
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delay-Aware Model-Based Reinforcement Learning

Evaluate custom gym environment

Old description

Installation

Run the code!

DATS for pendulum with action delay

Changing Hyper-parameters

Environment Arguments

Control Arguments

Results Logger

About

Releases

Packages

Languages

vinerich/delay-aware-MBRL

Folders and files

Latest commit

History

Repository files navigation

Delay-Aware Model-Based Reinforcement Learning

Evaluate custom gym environment

Old description

Installation

Run the code!

DATS for pendulum with action delay

Changing Hyper-parameters

Environment Arguments

Control Arguments

Results Logger

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages