Official implementation of Catch It.
We open-source the simulation training scripts and provide guidances to the real-robot deployment. We name the environment with Dexterous Catch with Mobile Manipulation (DCMM).
This codebase is under CC BY-NC 4.0 license, with inherited license in Legged Gym and RSL RL from ETH Zurich, Nikita Rudin and NVIDIA CORPORATION & AFFILIATES.
- 2024-10-17: Release the simulation training scripts and references for the real-robot depolyment. Have a try!
- Create conda environment and install pytorch:
conda create -n dcmm python=3.8
conda activate dcmm
pip install torch torchvision torchaudio
- Clone this repo and install our
gym_dcmm
:
git clone https://github.com/hang0610/catch_it.git
cd catch_it && pip install -e .
- Install additional packages in
requirements.txt
:
pip install -r requirements.txt
Under gym_dcmm/envs/
, run:
python3 DcmmVecEnv.py --viewer
Keyboard control:
↑
(up) : increase the y linear velocity (base frame) by 1 m/s;↓
(down) : decrease the y linear velocity (base frame) by 1 m/s;←
(left) : increase x linear velocity (base frame) by 1 m/s;→
(right) : decrease x linear velocity (base frame) by 1 m/s;4
(turn left) : decrease counter-clockwise angular velocity by 0.2 rad/s;6
(turn right) : increase counter-clockwise angular velocity by 0.2 rad/s;+
: increase the position & roll of the arm end effector by (0.1, 0.1, 0.1, 0.1) m;-
: decrease the position & roll of the arm end effector by (0.1, 0.1, 0.1, 0.1) m;7
: increase the joint position of the hand by (0.2, 0.2, 0.2, 0.2) rad;9
: decrease the joint position of the hand by (0.2, 0.2, 0.2, 0.2) rad;
Note: DO NOT change the speed of the mobile base too dramatically, or it might tip over.
We utilize 64 CPUs and a single Nvidia RTX 3070 Ti GPU for model training. Regarding the efficiency, it is recommended to use at least 12 CPUs to create over 16 parallel environments during training.
-
configs/config.yaml
:# Disables viewer or camera visualization viewer: False imshow_cam: False # RL Arguments test: False # False, True task: Tracking # Catching_TwoStage, Catching_OneStage, Tracking num_envs: 32 # This should be no more than 2x your CPUs (1x is recommended) object_eval: False # used to set checkpoint path checkpoint_tracking: '' checkpoint_catching: '' # checkpoint_tracking: 'assets/models/track.pth' # checkpoint_catching: 'assets/models/catch_two_stage.pth'
num_envs (int)
: the number of paralleled environments;task (str)
: task type (Tracking or Catching);test (bool)
: Setting to True enables testing mode, while setting to False enables training mode;checkpoint_tracking/catching (str)
: Load the pre-trained model for training/testing;viewer (bool)
: Launch the Mujoco viewer or not;imshow_cam (bool)
: Visualize the camera scene or not;object_eval (bool)
: Use the unseen objects or not;
-
configs/train/DcmmPPO.yaml
:minibatch_size
: The batch size for network input during PPO training;horizon_length
: The number of steps collected in a single trajectory during exploration;
Note: In the training mode, must satisfy:
num_envs
*horizon_length
= n *minibatch_size
, where n is a positive integer.
We provide our tracking model and catching model trained in a two-stage manner, which are assets/models/track.pth
and assets/models/catch_two_stage.pth
. You can test them for the tracking and catching task. Also, You can choose to evaluate on the training objects or the unseen objects by setting object_eval
.
Under the root catch_it
:
python3 train_DCMM.py test=True task=Tracking num_envs=1 checkpoint_tracking=$(path_to_tracking_model) object_eval=True viewer=$(open_mujoco_viewer_or_not) imshow_cam=$(imshow_camera_or_not)
python3 train_DCMM.py test=True task=Catching_TwoStage num_envs=1 checkpoint_catching=$(path_to_catching_model) object_eval=True viewer=$(open_mujoco_viewer_or_not) imshow_cam=$(imshow_camera_or_not)
Under the root catch_it
, train the base and arm to track the randomly thrown objects:
python3 train_DCMM.py test=False task=Tracking num_envs=$(number_of_CPUs)
-
Firts, load the tracking model from stage 1, and fill its path to the
checkpoint_tracking
inconfigs/config.yaml
.We provide our tracking model, which is
assets/models/track.pth
, which can be used to train the catching task (stage 2) directly. -
Second, train the whole body (the base, arm and hand) to catch the randomly thrown objects:
python3 train_DCMM.py test=False task=Catching_TwoStage num_envs=$(number_of_CPUs) checkpoint_tracking=$(path_to_tracking_model)
In the one-stage training baseline, we don't pre-train a tracking model but directly train a catching model from scratch. Similar to the setting of training tracking model, run:
python3 train_DCMM.py test=False task=Catching_OneStage num_envs=$(number_of_CPUs)
You can visualize the training curves and metrics via wandb
. In configs/config.yaml
:
# wandb config
output_name: Dcmm
wandb_mode: "disabled" # "online" | "offline" | "disabled"
wandb_entity: 'Your_username'
# wandb_project: 'RL_Dcmm_Track_Random'
wandb_project: 'RL_Dcmm_Catch_Random'
- Mobile Base: Ranger Mini V2
- Arm: XArm6
- Dexterous Hand: LEAP Hand
- Perception: Realsense D455
- Onboard Computer: Thunderobot MIX MiniPC
Our code is build upon Ubuntu 20.04, ROS Noetic. Lower or higher version may also work (not guaranteed).
- Ranger Mini V2: ranger_ros
- XArm6: xarm-ros
- LEAP Hand: LEAP Hand ROS1 SDK
- Realsense D455: realsense-ros and realsense-sdk
- Camera Calibration: easy_handeye
Yuanhang Zhang: [email protected]
You can create an issue if you meet any other bugs.
- If some mujoco rendering errors happen
mujoco.FatalError: gladLoadGL error
, try adding the following line beforemain()
in thetrain_DCMM.py
andgym_dcmm/envs/DcmmVecEnv.py
:os.environ['MUJOCO_GL'] = 'egl'
Please consider citing our paper if you find this repo useful:
@article{zhang2024catchitlearningcatch,
title={Catch It! Learning to Catch in Flight with Mobile Dexterous Hands},
author={Zhang, Yuanhang and Liang, Tianhai and Chen, Zhenyang and Ze, Yanjie and Xu, Huazhe},
year={2024},
journal={arXiv preprint arXiv:2409.10319}
}