Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
common.py		common.py
create_demos.py		create_demos.py
ddpg_td3.py		ddpg_td3.py
evaluation_worker.py		evaluation_worker.py
gail.py		gail.py
generate_expert_data.py		generate_expert_data.py
lfd_envs.py		lfd_envs.py
lfd_training_worker.py		lfd_training_worker.py
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
run_evaluation_worker.sh		run_evaluation_worker.sh
run_training_worker.sh		run_training_worker.sh
training_worker.py		training_worker.py
utils.py		utils.py

README.md

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson

Source code to accompany our paper.

Install Dependencies

We use Python 3.5.4rc1. You may also need to install a number of dependencies.

pip3 install gym
pip3 install --upgrade tensorflow tensorflow_probability
pip3 install absl-py

You will also need to install Mujoco and use a valid license. Follow the install instructions here.

Generating / Downloading Expert Trajectories:

Clone the repo of expert trajectories:

cd /data/dac/  # We will assume access to this directory.
git clone [email protected]:ikostrikov/gail-experts.git

Then use our import script to turn them into checkpoints (~1-2 hours):

python3 generate_expert_data.py \
  --src_data_dir /data/dac/gail-experts/ \
  --dst_data_dir /data/dac/gail-experts/

Running Training

Launch run_training_worker.sh to start the training worker. Then in another terminal, launch run_evaluation_worker.sh. Training takes approximately 1 to 2 hours.

To change the environment, number of expert trajectories, etc, edit the variables defined in the bash scripts above.

To see reward results live during training, launch a tensorboard:

tensorboard --logdir /tmp/lfd_state_action_traj_4_HalfCheetah-v2_20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dac

dac

README.md

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson

Install Dependencies

Generating / Downloading Expert Trajectories:

Running Training

Files

dac

Directory actions

More options

Directory actions

More options

Latest commit

History

dac

Folders and files

parent directory

README.md

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson

Install Dependencies

Generating / Downloading Expert Trajectories:

Running Training