[Website] [arXiv (coming soon!)] [PDF]
Yecheng Jason Ma1*, William Liang1*, Hung-Ju Wang1, Sam Wang1,
Yuke Zhu2,3, Linxi "Jim" Fan2, Osbert Bastani1, Dinesh Jayaraman1
1University of Pennsylvania, 2NVIDIA, 3University of Texas, Austin
*Equal Contribution
teaser.mp4
concept.mp4
Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.
This repository contains code for DrEureka's reward generation, RAPP, and domain randomization generation pipelines as well as the forward locomotion and globe walking environments. The two environments are modified from Rapid Locomotion and Dribblebot, respectively.
The following instructions will install everything under one Conda environment. We have tested on Ubuntu 20.04.
On AWS running this ami: ami-0cf2b4e024cdb6960
-
driver
sudo apt install nvidia-headless-535-server nvidia-utils-535-server -y # prob reboot now sudo reboot # also on each reboot speed up GPUs to max sudo nvidia-smi -ac "877,1530"
-
gpustat
sudo apt install gpustat
-
cuda
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda-repo-ubuntu2004-11-3-local_11.3.0-465.19.01-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2004-11-3-local_11.3.0-465.19.01-1_amd64.deb sudo apt-key add /var/cuda-repo-ubuntu2004-11-3-local/7fa2af80.pub sudo apt-get update sudo apt-get -y install cuda
-
this to .bashrc
export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda export CUDA_NVCC_EXECUTABLE=/usr/local/cuda/bin/nvcc export CUDA_INCLUDE_DIRS=/usr/local/cuda/include export CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-
install anaconda
bash Anaconda3-2024.02-1-Linux-x86_64.sh
# make sure code is cloned # then reopen windows to set env conda create -n dr_eureka python=3.8 conda activate dr_eureka
-
Install Pytorch with CUDA:
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
-
Install IsaacGym, the simulator for forward locomotion and globe walking:
- Download and install IsaacGym from NVIDIA: https://developer.nvidia.com/isaac-gym.
- Unzip the file:
tar -xf IsaacGym_Preview_4_Package.tar.gz
- Install the python package:
cd isaacgym/python pip install -e .
-
Install DrEureka:
cd dr_eureka pip install -e .
-
Install the forward locomotion and globe walking environments:
cd forward_locomotion pip install -e . # seems this needs: # https://stackoverflow.com/questions/77124879/pip-extras-require-must-be-a-dictionary-whose-values-are-strings-or-lists-of pip install setuptools==65.5.0 pip==21 cd ../globe_walking pip install -e .
We'll use forward locomotion (forward_locomotion
) as an example. The following steps can also be done for globe walking (globe_walking
).
First, run reward generation (Eureka):
# Ensure OPENAI_API_KEY is set in .bashrc
cd ../eureka
python eureka.py env=forward_locomotion
At the end, the final best reward will be saved in forward_locomotion/go1_gym/rewards/eureka_reward.py
and used for subsequent training runs. The Eureka logs will be stored in eureka/outputs/[TIMESTAMP]
, and the run directory of the best-performing policy will be printed to terminal.
Second, copy the run directory and run RAPP:
cd ../dr_eureka
python rapp.py env=forward_locomotion run_path=[YOUR_RUN_DIRECTORY]
This will update the prompt in dr_eureka/prompts/initial_users/forward_locomotion.txt
with the computed RAPP bounds.
Third, run run DR generation with the new reward and RAPP bounds:
python dr_eureka.py env=forward_locomotion
The trained policies are ready for deployment, see the section below.
Our deployment infrastructure is based on Walk These Ways. We'll use forward locomotion as an example, though the deployment setup for both environments are essentially the same.
- Add the (relative) path to your checkpoint to
forward_locomotion/go1_gym_deploy/scripts/deploy_policy.py
. Note that you can have multiple policies at once and switch between them. - Start up the Go1, and connect to it on your machine via Ethernet. Make sure you can ssh onto the NX (
192.168.123.15
). - Put the robot into damping mode with the controller: L2+A, L2+B, L1+L2+START. The robot should be lying on the ground afterwards.
- Run the following to send the checkpoint and code to the Go1:
cd forward_locomotion/go1_gym_deploy/scripts ./send_to_unitree.sh
- Now, ssh onto the Go1 and run the following:
chmod +x installer/install_deployment_code.sh cd ~/go1_gym/go1_gym_deploy/scripts sudo ../installer/install_deployment_code.sh
- Make sure your Go1 is in a safe location and hung up. Start up two prompts in the Go1. In the first, run:
cd ~/go1_gym/go1_gym_deploy/autostart ./start_unitree_sdk.sh
- In the second, run:
cd ~/go1_gym/go1_gym_deploy/docker sudo make autostart && sudo docker exec -it foxy_controller bash
- The previous command should enter a Docker image. Within it, run:
cd /home/isaac/go1_gym && rm -r build && python3 setup.py install && cd go1_gym_deploy/scripts && python3 deploy_policy.py
- Now, you can press R2 on the controller, and the robot should extend its legs (calibrate).
- Pressing R2 again will start the policy.
- To switch policies, press L1 or R1 to switch between policies in the list in
deploy_policy.py
.
DrEureka manipulates pre-defined environments by inserting generated reward functions and domain randomization configurations. To do so, we have designed the environment code to be modular and easily configurable. Below, we explain how the components of our code interact with each other, using forward locomotion as an example:
eureka/eureka.py
runs the reward generation process. It uses:
- Environment source code as input to the LLM, which is at
eureka/envs/forward_locomotion.py
. This is a shortened version of the actual environment code to save token usage. - Reward signature definition as input to the LLM, which is at
eureka/prompts/reward_signatures/forward_locomotion.txt
. This file should contain a simple format for the LLM to follow. It may also contain additional instructions or explanations for the format, if necessary. - Location of training script, which is defined as
train_script: scripts/train.py
ineureka/cfg/env/forward_locomotion.yaml
. - Location of the reward template and output files, which are defined as
reward_template_file: go1_gym/rewards/eureka_reward_template.py
andreward_output_file: go1_gym/rewards/eureka_reward.py
ineureka/cfg/env/forward_locomotion.yaml
. Eureka reads the template file's boilerplate code, fills in the reward function, and writes to the output file for use during training. - Function to extract training metrics, which is defined in
eureka/utils/misc.py
asconstruct_run_log(stdout_str)
. This function parses the training script's standard output into a dictionary. Alternatively, it can be used to load a file containing metrics saved during training (for example, tensorboard logs).
dr_eureka/rapp.py
computes the RAPP bounds. It uses:
- Location of the play (evaluation) script, which is defined as
play_script: scripts/play.py
indr_eureka/cfg/env/forward_locomotion.yaml
. - Location of the DR template and output files, which are defined as
dr_template_file: go1_gym/envs/base/legged_robot_config_template.py
anddr_output_file: go1_gym/envs/base/legged_robot_config.py
indr_eureka/cfg/env/forward_locomotion.yaml
. Like the reward template/output setup, DrEureka fills in the boilerplate code and writes to the output file for use during evaluation. - List of randomizable DR parameters, defined in the variable
parameter_test_vals
indr_eureka/rapp.py
. - Simple success criteria for the task, defined as the function
forward_locomotion_success()
indr_eureka/rapp.py
.
dr_eureka/dr_eureka.py
runs the DR generation process. It uses:
- RAPP bounds as input to the LLM, defined in
dr_eureka/prompts/initial_users/forward_locomotion.txt
. This uses the direct output ofdr_eureka/rapp.py
. - Best reward function, the output of reward generation. This should be in the file defined in
reward_output_file: go1_gym/rewards/eureka_reward.py
. - Location of the training script, same as reward generation. This is defined in
dr_eureka/cfg/env/forward_locomotion.yaml
. - Location of the DR template and output files, same as RAPP.
- Function to extract training metrics, same as reward generation. Note that this is used only for a general idea of the policy's performance in simulation, and unlike reward generation, is not used for iterative feedback.
We thank the following open-sourced projects:
- Our simulation runs in IsaacGym.
- Our LLM-generation algorithm builds on Eureka.
- Our environments are adapted from Rapid Locomotion and Dribblebot.
- The environment structure and training code build on Legged Gym and RSL_RL.
This codebase is released under MIT License.
If you find our work useful, please consider citing us!
@article{ma2024dreureka,
title = {DrEureka: Language Model Guided Sim-To-Real Transfer},
author = {Yecheng Jason Ma and William Liang and Hungju Wang and Sam Wang and Yuke Zhu and Linxi Fan and Osbert Bastani and Dinesh Jayaraman}
year = {2024},
}