This repository provides a PyTorch implementation of the paper Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation.
Zechu Li*, Tao Chen*, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
@inproceedings{li2023parallel,
title={Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation},
author={Li, Zechu and Chen, Tao and Hong, Zhang-Wei and Ajay, Anurag and Agrawal, Pulkit},
booktitle={International Conference on Machine Learning},
year={2023},
organization={PMLR}
}
-
Clone the package:
git clone [email protected]:Improbable-AI/pql.git cd pql
-
Create Conda environment and install dependencies:
./create_conda_env_pql.sh pip install -e .
Note In original paper, we use Isaac Gym Preview 3 and task configs in commit ca7a4fb762f9581e39cc2aab644f18a83d6ab0ba in IsaacGymEnvs.
-
Download and install Isaac Gym Preview 4 from https://developer.nvidia.com/isaac-gym
-
Unzip the file:
tar -xf IsaacGym_Preview_4_Package.tar.gz
-
Install IsaacGym
cd isaacgym/python pip install -e . --no-deps
-
Install IsaacGymEnvs
git clone https://github.com/NVIDIA-Omniverse/IsaacGymEnvs.git cd IsaacGymEnvs pip install -e . --no-deps
-
Export LIBRARY_PATH
export LD_LIBRARY_PATH=$(conda info --base)/envs/pql/lib/:$LD_LIBRARY_PATH
Warning Note that wall-clock efficiency highly depends on the GPU type and will decrease with smaller/fewer GPUs (check Section 4.4 in the paper).
Isaac Gym requires an NVIDIA GPU. To train in the default configuration, we recommend a GPU with at least 10GB of VRAM. For smaller GPUs, you can decrease the number of parallel environments (cfg.num_envs
), batch_size (cfg.algo.batch_size
), replay buffer capacity (cfg.algo.memory_size
), etc. ⚡ PQL can run on 1/2/3 GPUs (set GPU ID cfg.p_learner_gpu
and cfg.v_learner_gpu
; default GPU ID for Isaac Gym env is GPU:0
).
We use Weights & Biases (W&B) for logging.
-
Get a W&B account from https://wandb.ai/site
-
Get your API key from https://wandb.ai/authorize
-
set up your account in terminal
export WANDB_API_KEY=$API Key$
Run ⚡ PQL on Allegro Hand task. A full list of tasks in Isaac Gym is available here.
python scripts/train_pql.py task=AllegroHand
Run ⚡ PQL-D (with distributional RL)
python scripts/train_pql.py task=AllegroHand algo.distl=True algo.cri_class=DistributionalDoubleQ
Run ⚡ PQL on a single GPU. The default is on 2 GPUs. Please specify the GPU id.
python scripts/train_pql.py task=AllegroHand algo.num_gpus=1 algo.p_learner_gpu=0 algo.v_learner_gpu=0
Run ⚡ PQL on 3 GPUs.
python scripts/train_pql.py task=AllegroHand algo.p_learner_gpu=1 algo.v_learner_gpu=2
Run DDPG baseline
python scripts/train_baselines.py algo=ddpg_algo task=AllegroHand
Run SAC baseline
python scripts/train_baselines.py algo=sac_algo task=AllegroHand
Run PPO baseline
python scripts/train_baselines.py algo=ppo_algo task=AllegroHand isaac_param=True
Checkpoints are automatically saved as W&B Artifacts.
To load and visualize the policy, run
python scripts/visualize.py task=AllegroHand headless=False num_envs=10 artifact=$team-name$/$project-name$/$run-id$/$version$
We thank the members of the Improbable AI lab for the helpful discussions and feedback on the paper. We are grateful to MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources.