English | 简体中文
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
📝Paper | 🌍Project Page | 🛢️Data
- [2024/12/04] We have released the RoboMatrix supervised fine-tuning (SFT) dataset, which contains 1,500 high-quality human-annotated demonstration videos.
crossing_obstacles_with_adversarial_interaction.mp4
Note: If ROS2 is already installed on your system, please skip this step.
ROS2 distro for your Ubuntu
- Ubuntu 20.04 ---> ROS2 Foxy ---> official installation guidance
- Ubuntu 22.04 ---> ROS2 Humble ---> official installation guidance
Install colcon tool
sudo apt install python3-colcon-common-extensions
mkdir -p ~/RoboMatrix/src && cd ~/RoboMatrix/src
git clone https://github.com/WayneMao/RoboMatrix.git
cd ~/RoboMatrix && colcon build
sudo apt install libopus-dev python3-pip
python3 -m pip install -U numpy numpy-quaternion pyyaml
# Install RoboMaster-SDK
python3 -m pip install git+https://github.com/jeguzzi/RoboMaster-SDK.git
python3 -m pip install git+https://github.com/jeguzzi/RoboMaster-SDK.git#"egg=libmedia_codec&subdirectory=lib/libmedia_codec"
# install dependencies and torch
pip install -r requirements.txt
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
Grounding-DINO-1.5-API
cd src/robomatrix_client/robomatrix_client
git clone https://github.com/IDEA-Research/Grounding-DINO-1.5-API.git
cd Grounding-DINO-1.5-API
pip install -v -e .
- Package Docker
- 🤗 Release Supervised Fine-tuning dataset
- Optimize VLA ROS communication
- Open source VLA Skill model code
- Release VLA Skill model weights
- Open source Shooting code
If you find our work helpful, please cite us:
@article{mao2024robomatrix,
title={RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World},
author={Mao, Weixin and Zhong, Weiheng and Jiang, Zhou and Fang, Dong and Zhang, Zhongyue and Lan, Zihan and Jia, Fan and Wang, Tiancai and Fan, Haoqiang and Yoshie, Osamu},
journal={arXiv preprint arXiv:2412.00171},
year={2024}
}
- Implementation of Vision-Language-Action (VLA) skill model is based on LLaVA.
- RoboMatrix-ROS is based on RoboMaster-SDK and ROS2.
- Some additional libraries: Grounding-DINO-1.5, YOLO-World.