Skip to content

This repository contains the implementation of Ground4Act, a two-stage approach for collaborative pushing and grasping in clutter using a visual-language model.

License

Notifications You must be signed in to change notification settings

HDU-VRLab/Ground4Act

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Ground4Act: Leveraging Visual-Language Model for Collaborative Pushing and Grasping in Clutter

🌟This repository contains the implementation of Ground4Act, a two-stage approach for collaborative pushing and grasping in clutter using a visual-language model.📗Demonstration | Installation | Model Weights | Getting Started | Related Work | BibTeX

Demonstration

🤖Video 🌐Personal homepage

Installation

The repository is based on ubuntu18.04. Before you start, ensure that ROS (Robot Operating System) is installed on your system.

Step 1: Clone the Repository

Open your terminal (Python 2) and run the following command to clone the repository:

mkdir ur_ws && cd ur_ws
git clone https://github.com/HDU-VRLab/Ground4Act.git

Install the necessary libraries under the current terminal for push network.

sudo chmod +x install_ros_packages.sh
./install_ros_packages.sh
catkin_make
pip install torch==1.0.0 scipy==1.2.3 torchvision==0.2.1

Step 2: Create a new environment

Create a Python 3 virtual environment using conda. For information on Visual Grounding, please refer to RefTR.

conda create -n Vlpg python=3.7
conda activate Vlpg
pip3 install torch torchvision torchaudio scikit-image
cd vl_grasp/RoboRefIt
pip3 install -r requirements.txt 

Model Weights

Resource Description
Sim_model Place the downloaded simulation model under "/home/xxx/.gazebo/models".
Ground4Act Place the downloaded Push network weight in "src\gjt_ur_moveit_gazebo\env_info\push.pth".
The Visual Grounding weight is placed in "src\vl_grasp\logs".

Getting Started

Usage Guidelines

When using ROS with MoveIt for control, please follow these guidelines:

  • The terminal for controlling the system must run Python 2.
  • The terminal for executing algorithm must run Python 3.

This setup is crucial for ensuring proper functionality and compatibility between the different components of the system.

Step 1: Launch simulation and load MoveIt

Please turn on the simulation button in the lower left corner of gazebo, then you can control the robot through MoveIt.

cd ur_ws
source ./devel/setup.bash
roslaunch gjt_ur_moveit_gazebo start_gjt_ur_moveit_gazebo.launch 

We provide many useful unit test scripts. Preloading the object model in gazebo helps with later execution speed.

So it is recommended to run in sequence at terminal 2:

python src/gjt_ur_moveit_gazebo/gazebo_scripts/spawn_model.py
python src/gjt_ur_moveit_gazebo/gazebo_scripts/moveitServer.py

Step 2: Executing algorithm

conda activate Vlpg
python src/vl_grasp/vl_push_grasp.py

Related Work

Many thanks to previous researchers for sharing their excellent work:

@article{yang2021collaborative,
  title={Collaborative pushing and grasping of tightly stacked objects via deep reinforcement learning},
  author={Yang, Yuxiang and Ni, Zhihao and Gao, Mingyu and Zhang, Jing and Tao, Dacheng},
  journal={IEEE/CAA Journal of Automatica Sinica},
  volume={9},
  number={1},
  pages={135--145},
  year={2021},
  publisher={IEEE}
}

@inproceedings{lu2023vl,
  title={VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes},
  author={Lu, Yuhao and Fan, Yixuan and Deng, Beixing and Liu, Fangfu and Li, Yali and Wang, Shengjin},
  booktitle={2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages={976--983},
  year={2023},
  organization={IEEE}
}

@inproceedings{muchen2021referring,
  title={Referring Transformer: A One-step Approach to Multi-task Visual Grounding},
  author={Muchen, Li and Leonid, Sigal},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

BibTeX

If you find our code or models useful in your work, please cite our paper.

@article{YANG2024105280,
  title = {Ground4Act: Leveraging visual-language model for collaborative pushing and grasping in clutter},
  author = {Yuxiang Yang and Jiangtao Guo and Zilong Li and Zhiwei He and Jing Zhang},
  journal = {Image and Vision Computing},
  volume = {151},
  pages = {105280},
  year = {2024},
  url = {https://www.sciencedirect.com/science/article/pii/S0262885624003858}
}

About

This repository contains the implementation of Ground4Act, a two-stage approach for collaborative pushing and grasping in clutter using a visual-language model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published