Official release for the code used in paper: Learning from Active Human Involvement through Proxy Value Propagation
Webpage | Code | Poster | Paper
# Clone the code to local machine
git clone https://github.com/metadriverse/pvp
cd pvp
# Create Conda environment
conda create -n pvp python=3.7
conda activate pvp
# Install dependencies
pip install -r requirements.txt
pip install -e .
# Install evdev package (Linux only)
pip install evdev
# You now have installed MetaDrive and MiniGrid.
# To set up CARLA dependencies, please click the details below.
Set up CARLA dependencies
# Step 1: Download and unzip CARLA 0.9.10.1 to your home folder
cd ~/
wget https://carla-releases.s3.eu-west-3.amazonaws.com/Linux/CARLA_0.9.10.1.tar.gz
export CARLA_ROOT="CARLA_0.9.10.1"
mkdir ${CARLA_ROOT}
tar -xf CARLA_0.9.10.1.tar.gz -C ${CARLA_ROOT} # CARLA is stored at: ~/CARLA_0.9.10.1
# Step 2: Setup the environment variables
vim ~/.bashrc
# Add following sentences and replace PATH_TO_CARLA_ROOT with the path to ${CARLA_ROOT}
export CARLA_ROOT="~/CARLA_0.9.10.1"
export PYTHONPATH="${CARLA_ROOT}/PythonAPI/carla/":"${CARLA_ROOT}/PythonAPI/carla/dist/carla-0.9.10-py3.7-linux-x86_64.egg":${PYTHONPATH}
# Step 3: Activate your conda environment and test if CARLA is installed correctly.
conda activate pvp # If you are using conda environment "pvp"
python -c "import carla" # If no error raises, the installation is successful.
# Step 4: Install dependencies
pip install DI-engine==0.2.2
pip install torchvision
pip install markupsafe==2.0.1
# NOTE: If you are using a new conda environment, you might need to reinstall 'pvp' repo.
# Now let's jump to the CARLA section to run experiment!
Metadrive provides options for three control devices: steering wheel, gamepad and keyboard.
During experiments human subject can always press E
to pause the experiment and press Esc
to exit the experiment. The main experiment will run for 40K steps and takes about one hour. For toy environment with --toy_env
, it takes about 10 minutes.
Click for the experiment details:
MetaDrive - Keyboard
# Go to the repo root
cd ~/pvp
# Run toy experiment
python pvp/experiments/metadrive/train_pvp_metadrive.py \
--device keyboard \
--toy_env \
--exp_name pvp_metadrive_toy_keyboard
# Run full experiment
python pvp/experiments/metadrive/train_pvp_metadrive.py \
--device keyboard \
--exp_name pvp_metadrive_keyboard \
--wandb \
--wandb_project WADNB_PROJECT_NAME \
--wandb_team WANDB_ENTITY_NAME
Action | Control |
---|---|
Steering | A/D |
Throttle | W |
Human intervention | Space or WASD |
MetaDrive - Steering Wheel (Logitech G29)
Note: Do not connect Xbox controller with the steering wheel at the same time!
# Go to the repo root
cd ~/pvp
# Run toy experiment
python pvp/experiments/metadrive/train_pvp_metadrive.py \
--device wheel \
--toy_env \
--exp_name pvp_metadrive_toy_wheel
# Run full experiment
python pvp/experiments/metadrive/train_pvp_metadrive.py \
--device wheel \
--exp_name pvp_metadrive_wheel \
--wandb \
--wandb_project WADNB_PROJECT_NAME \
--wandb_team WANDB_ENTITY_NAME
Action | Control |
---|---|
Steering | Steering wheel |
Throttle | Throttle pedal |
Human intervention | Left/Right gear shifter |
MetaDrive - Gamepad (Xbox Wireless Controller)
Note: Do not connect Xbox controller with the steering wheel at the same time!
# Go to the repo root
cd ~/pvp
# Run toy experiment
python pvp/experiments/metadrive/train_pvp_metadrive.py \
--device gamepad \
--toy_env \
--exp_name pvp_metadrive_toy_gamepad
# Run full experiment
python pvp/experiments/metadrive/train_pvp_metadrive.py \
--device gamepad \
--exp_name pvp_metadrive_gamepad \
--wandb \
--wandb_project WADNB_PROJECT_NAME \
--wandb_team WANDB_ENTITY_NAME
Action | Control |
---|---|
Steering | Left-right of Left Stick |
Throttle | Up-down of Right Stick |
Human intervention | X/A/B & Left/Right Trigger |
We use CARLA 0.9.10.1 as the backend and use the environment created by DI-Drive as the gym interface. CARLA uses a server-client architecture. To run experiment, launch the server first:
# Launch an independent terminal, then:
cd ~/CARLA_0.9.10.1 # Go to your CARLA root
./CarlaUE4.sh -carla-rpc-port=9000 -quality-level=Epic # Can set to Low to accelerate
# Now you should see a pop-up window and you can use WASD to control the camera.
Click for the experiment details:
CARLA - Steering Wheel (Logitech G29)
Note: Do not connect Xbox controller with the steering wheel at the same time!
# Launch the CARLA server if you haven't done yet
~/CARLA_0.9.10.1/CarlaUE4.sh -carla-rpc-port=9000 -quality-level=Epic # Can set to Low to accelerate
# Go to the repo root
cd ~/pvp
# Run experiment without Wandb:
python pvp/experiments/carla/train_pvp_carla.py --exp_name pvp_carla_test
# Run full experiment
python pvp/experiments/metadrive/train_pvp_metadrive.py \
--exp_name pvp_carla \
--wandb \
--wandb_project WADNB_PROJECT_NAME \
--wandb_team WANDB_ENTITY_NAME
Action | Control |
---|---|
Throttle | Throttle pedal |
Human intervention | Left/Right gear shifter |
Steering | Steering wheel |
Click for the experiment details:
MiniGrid - Keyboard
Mapping between environment nick name --env
and env_id
:
emptyroom
-MiniGrid-Empty-6x6-v0
tworoom
-MiniGrid-MultiRoom-N2-S4-v0
fourroom
-MiniGrid-MultiRoom-N4-S5-v0
# Go to the repo root
cd ~/pvp
# Run experiment without Wandb:
python pvp/experiments/minigrid/train_pvp_minigrid.py --exp_name pvp_minigrid_test
# Run full experiment
# Choose --env from ["emptyroom", "tworoom", "fourroom"]
python pvp/experiments/minigrid/train_pvp_minigrid.py \
--env tworoom \
--exp_name pvp_minigrid \
--wandb \
--wandb_project WADNB_PROJECT_NAME \
--wandb_team WANDB_ENTITY_NAME
Action | Control |
---|---|
Turn Left | Left |
Turn Right | Right |
Gown Straight | Up |
Approve Agent Action | Space / Down |
Open Door / Toggle | T |
Pickup | P |
Drop | D |
Done Complete Task | D |
There are some important information I want to share:
- We as the human demonstrator always follow the same behavior. For myself, I will always move around the room in counterclockwise until I reach the door.
- The agent takes 7x7 grid (with different semantic information in different channels) as input and we have a CNN network as the feature extractor. You should notice that
- the agent is “blind” for those information outside its perceptive field,
- the agent does not has memory because the input to the network does not contain history information.
So a consistent behavior as the supervision signal is required when human provides demonstrations.
@inproceedings{peng2023learning,
title={Learning from Active Human Involvement through Proxy Value Propagation},
author={Peng, Zhenghao and Mo, Wenjie and Duan, Chenda and Li, Quanyi and Zhou, Bolei},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}