This is the planning and simulation framework used in "Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction," presented at CoRL 2024. This repository is based on STAP "Sequencing Task-Agnostic Policies".
For a brief overview of our work, please refer to our project page.
Further details can be found in our paper available on arXiv.
Without Text2Interaction |
With Text2Interaction |
The Text2Interaction framework can be broken down into three phases:
- Train skills offline (i.e. policies, Q-functions, dynamics models, uncertainty quantifers)
- Generate preference functions (can be found in fm-planning)
- Plan with skills online (i.e. motion planning, task and motion planning). We provide implementations for phases 1 and 3 in this repo:
-
Skill library: A suite of reinforcement learning (RL) and inverse RL algorithms to learn three skills:
Pick
,Place
, andStatic_handover
. Also supported arePush
andPull
but not tested. -
Learned models: We learn a policy
$\pi^k(a \mid s)$ , a Q-value function$Q^K(s, a)$ , and a transition distribution$T^k(s_{t+1} \mid s_t, a_t)$ per primitive from simulated rollouts.
- Motion planners (STAP): A set of sampling-based motion planners including randomized sampling, cross-entropy method, planning with uncertainty-aware metrics, and combinations. Our experiments in Text2Interaction uses the CEM planner.
- Task and motion planners (TAMP): Coupling PDDL-based task planning with STAP-based motion planning. You can either generate entire task plans directly from the user instruction or you can generate task goals in PDDL an then plan to the goal.
- 3D Environments: PyBullet tabletop manipulation environment with domain randomization.
- Human animation: Our simulation includes a human, which is animated based on CMU motion capture data.
- Safety shield We provide safety guarantees for the human user using our provably safe controller.
Make sure to properly clone this repo. Then, you can either use the Dockerfiles provided (for training purposes) or setup this repo manually (for debugging).
Clone this repo with submodules
git clone --recurse-submodules [email protected]:JakobThumm/STAP.git
If you forgot to clone with submodules, make sure to add them now:
git submodule init
git submodule update --recursive
To build and run the docker container with GPU, use
./build_docker_train.sh user gpu
./run_docker_train.sh user gpu
and on CPU use
./build_docker_train.sh user
./run_docker_train.sh user
It is likely that you need a specific CUDA version to run the GPU docker. We provide Dockerfiles for version 11.8 and 12.1.
This repository is primarily tested on Ubuntu 20.04 with Python 3.8.10. For all non-python requirements, see the requirements list in the Dockerfiles.
Python packages are managed through conda
conda env create -f environment.yml
or pip
# Install torch with the correct CUDA version of your GPU.
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 networkx==3.0 --index-url https://download.pytorch.org/whl/cu118 swig
pip install --upgrade pip && pip install -r requirements.txt
First, install Eigen3.4 using:
curl -LJO https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.bz2 && \
tar -xvf eigen-3.4.0.tar.bz2 && \
rm eigen-3.4.0.tar.bz2
cd eigen-3.4.0/build
cmake .. && make install
cd ../..
echo 'export EIGEN3_INCLUDE_DIR=$(pwd)' >> ~/.bashrc
source ~/.bashrc
Then install sara-shield
cd third_party/sara-shield
python setup.py install
cd ../..
Install scod-regression
using
pip install third_party/scod-regression/
Install STAP
using
pip install .
You can either use our pre-trained models or train the skills yourselves. After retrieving the models, you can evaluate the models in planning.
STAP supports training skills, and composing these components at test-time for planning.
- STAP module: The majority of the project code is located in the package
stap/
. - Scripts: Code for launching training, experiments, debugging, plotting, and visualization is under
scripts/
. - Configs: Training and evaluation functionality is determined by
.yaml
configuration files located inconfigs/
.
As an alternative to training the skills from scratch, we provide checkpoints that can be downloaded and directly used to evaluate STAP planners.
Run the following commands to download the model checkpoints to the default models/
directory (this requires ~10GBs of disk space):
- Demonstration data used to train inverse RL skills (
datasets
). This is not required for the evaluation.bash scripts/download/download_datasets.sh
- Skills trained with inverse RL (
policies_irl
), the Q-value function (value_fns_irl
) and their dynamics models (dynamics_irl
).bash scripts/download/download_models.sh
- Evaluation results that correspond to evaluating Text2Interaction.
bash scripts/download/download_results.sh
Skills in STAP are trained independently in custom environments. We provide an inverse RL pipeline for training skills, which can be executed using the docker image. Running Training the skills consists of 4 steps:
- Generate rollouts
./run_docker_train.sh user gpu bash scripts/data/generate_all_datasets_tmux.sh
- Train Q-value functions
bash scripts/train/train_values_docker.sh
- Train policies
bash scripts/train/train_policies_docker.sh
- Train dynamics distributions
bash scripts/train/train_dynamics_docker.sh
First, we generate the rollout data to train our models on. The training data consists of rollouts
For each primitive, we train
For each primitive, we train
For each primitive, we train the transition distribution
We can evaluate these models and the generated preference functions using the scripts provided in scripts/eval/
.
The recurring example in Text2Interaction is the screwdriver handover. You can test this in simulation using
- With custom preference function:
python scripts/eval/eval_planners.py --planner-config configs/pybullet/planners/policy_cem_screwdriver_custom_fns.yaml --env-config configs/pybullet/envs/official/sim_domains/screwdriver_handover/task0.yaml --policy-checkpoints models/policies_irl/pick/final_model.pt models/policies_irl/place/final_model.pt models/policies_irl/static_handover/final_model.pt --dynamics-checkpoint models/dynamics_irl/pick_place_static_handover_dynamics/final_model.pt --use_informed_dynamics 1 --seed 0 --gui 1 --closed-loop 1 --num-eval 100 --path plots/planning/screwdriver_handover/task0 --verbose
- Without custom preference function:
python scripts/eval/eval_planners.py --planner-config configs/pybullet/planners/policy_cem_no_custom.yaml --env-config configs/pybullet/envs/official/sim_domains/screwdriver_handover/task0.yaml --policy-checkpoints models/policies_irl/pick/final_model.pt models/policies_irl/place/final_model.pt models/policies_irl/static_handover/final_model.pt --dynamics-checkpoint models/dynamics_irl/pick_place_static_handover_dynamics/final_model.pt --use_informed_dynamics 1 --seed 0 --gui 1 --closed-loop 1 --num-eval 100 --path plots/planning/screwdriver_handover/task0 --verbose
We evaluated this behavior on a real-world Frank Research 3 robot. The code for that is provided in the ROS-noetic
branch. In our real-world setup, we use sara-shield
to guarantee the safety of the human user. The code for deploying sara-shield
together with this repo can be found here.
We evaluate four models in our ablation study:
- Oracle: hand scripted preference functions run with the default Text2Interaction formulation.
- Baseline 1: only optimize for task success
- Baseline 2: optimize for the sum of task success and preference function. Uses the generated preference functions of the LLM.
- Text2Interaction: optimize for the product of task success and preference function. Uses the generated preference functions of the LLM.
To reproduce our evaluation, run:
- Oracle
./run_docker_train.sh user gpu ./scripts/eval/eval_object_arrangement_oracle.sh
- Baseline 1
./run_docker_train.sh user gpu ./scripts/eval/eval_object_arrangement_baseline.sh
- Baseline 2
./run_docker_train.sh user gpu ./scripts/eval/eval_object_arrangement_additive_baseline.sh
- Text2Interaction
./run_docker_train.sh user gpu ./scripts/eval/eval_object_arrangement_ablation.sh
To summarize the generated ablation results, run
./models/mv_all_eval_files.sh
python scripts/eval/eval_planner_summary.py --eval-path models/eval/planning/object_arrangement/
The resulting summary can be found in models/eval/planning/object_arrangement/summary.csv
.
Sequencing Task-Agnostic Policies and Text2Interaction is offered under the MIT License agreement. If you find Text2Interaction useful, please consider citing our work:
@inproceedings{thumm_2024_Text2InteractionEstablishing,
title = {Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction},
shorttitle = {Text2Interaction},
booktitle = {8th Annual Conference on Robot Learning},
author = {Thumm, Jakob and Agia, Christopher and Pavone, Marco and Althoff, Matthias},
year = {2024},
url = {https://openreview.net/forum?id=s0VNSnPeoA&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3Drobot-learning.org%2FCoRL%2F2024%2FConference%2FAuthors%23your-submissions)},
langid = {english},
}