Ground4Act: Leveraging Visual-Language Model for Collaborative Pushing and Grasping in Clutter

🌟This repository contains the implementation of Ground4Act, a two-stage approach for collaborative pushing and grasping in clutter using a visual-language model.📗Demonstration | Installation | Model Weights | Getting Started | Related Work | BibTeX

Demonstration

🤖Video 🌐Personal homepage

Installation

The repository is based on ubuntu18.04.

Before you start, ensure that ROS (Robot Operating System) is installed on your system.

Step 1: Clone the Repository

Open your terminal (Python 2) and run the following command to clone the repository:

mkdir ur_ws && cd ur_ws
git clone https://github.com/HDU-VRLab/Ground4Act.git

Install the necessary libraries under the current terminal for push network.

sudo chmod +x install_ros_packages.sh
./install_ros_packages.sh
catkin_make
pip install torch==1.0.0 scipy==1.2.3 torchvision==0.2.1

Step 2: Create a new environment

Create a Python 3 virtual environment using conda. For information on Visual Grounding, please refer to RefTR.

conda create -n Vlpg python=3.7
conda activate Vlpg
pip3 install torch torchvision torchaudio scikit-image
cd vl_grasp/RoboRefIt
pip3 install -r requirements.txt

Model Weights

Resource	Description
Sim_model	Place the downloaded simulation model under "/home/xxx/.gazebo/models".
Ground4Act	Place the downloaded Push network weight in "src\gjt_ur_moveit_gazebo\env_info\push.pth". The Visual Grounding weight is placed in "src\vl_grasp\logs".

Getting Started

Usage Guidelines

When using ROS with MoveIt for control, please follow these guidelines:

The terminal for controlling the system must run Python 2.
The terminal for executing algorithm must run Python 3.

This setup is crucial for ensuring proper functionality and compatibility between the different components of the system.

Step 1: Launch simulation and load MoveIt

Please turn on the simulation button in the lower left corner of gazebo, then you can control the robot through MoveIt.

cd ur_ws
source ./devel/setup.bash
roslaunch gjt_ur_moveit_gazebo start_gjt_ur_moveit_gazebo.launch

We provide many useful unit test scripts. Preloading the object model in gazebo helps with later execution speed.

So it is recommended to run in sequence at terminal 2:

python src/gjt_ur_moveit_gazebo/gazebo_scripts/spawn_model.py
python src/gjt_ur_moveit_gazebo/gazebo_scripts/moveitServer.py

Step 2: Executing algorithm

conda activate Vlpg
python src/vl_grasp/vl_push_grasp.py

Related Work

Many thanks to previous researchers for sharing their excellent work:

@article{yang2021collaborative,
  title={Collaborative pushing and grasping of tightly stacked objects via deep reinforcement learning},
  author={Yang, Yuxiang and Ni, Zhihao and Gao, Mingyu and Zhang, Jing and Tao, Dacheng},
  journal={IEEE/CAA Journal of Automatica Sinica},
  volume={9},
  number={1},
  pages={135--145},
  year={2021},
  publisher={IEEE}
}

@inproceedings{lu2023vl,
  title={VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes},
  author={Lu, Yuhao and Fan, Yixuan and Deng, Beixing and Liu, Fangfu and Li, Yali and Wang, Shengjin},
  booktitle={2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages={976--983},
  year={2023},
  organization={IEEE}
}

@inproceedings{muchen2021referring,
  title={Referring Transformer: A One-step Approach to Multi-task Visual Grounding},
  author={Muchen, Li and Leonid, Sigal},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

BibTeX

If you find our code or models useful in your work, please cite our paper.

@article{YANG2024105280,
  title = {Ground4Act: Leveraging visual-language model for collaborative pushing and grasping in clutter},
  author = {Yuxiang Yang and Jiangtao Guo and Zilong Li and Zhiwei He and Jing Zhang},
  journal = {Image and Vision Computing},
  volume = {151},
  pages = {105280},
  year = {2024},
  url = {https://www.sciencedirect.com/science/article/pii/S0262885624003858}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ground4Act: Leveraging Visual-Language Model for Collaborative Pushing and Grasping in Clutter

Demonstration

Installation

Step 1: Clone the Repository

Step 2: Create a new environment

Model Weights

Getting Started

Usage Guidelines

Step 1: Launch simulation and load MoveIt

Step 2: Executing algorithm

Related Work

BibTeX

About

Releases

Packages

License

HDU-VRLab/Ground4Act

Folders and files

Latest commit

History

Repository files navigation

Ground4Act: Leveraging Visual-Language Model for Collaborative Pushing and Grasping in Clutter

Demonstration

Installation

Step 1: Clone the Repository

Step 2: Create a new environment

Model Weights

Getting Started

Usage Guidelines

Step 1: Launch simulation and load MoveIt

Step 2: Executing algorithm

Related Work

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages