Skip to content

A Maximal Mutual Information Criterion for Manipulation Concept Discovery

License

Notifications You must be signed in to change notification settings

PeiZhou26/MaxMI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MaxMI

This is the official repository for: MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery

Installation

  1. Clone the repository:

    git clone https://github.com/PeiZhou26/MaxMI.git
    cd MaxMI
  2. Create the Conda environment using the environment.yml file:

    conda env create -f environment.yml
  3. Activate the environment:

    conda activate maxmi

Tasks

The current code supports four tasks from the ManiSkill2 (v0.4.2) benchmark: PickCube-v0, StackCube-v0, PegInsertionSide-v0, and TurnFaucet-v0.

Data Preparation

The behavior cloning datasets can be accessed via this link. Each task includes approximately 1,000 successful demonstrations; however, we use a randomly sampled subset of 500 for our experiments.

After downloading the datasets, place them in the /data directory. To evaluate the intermediate task success rate, the ManiSkill2 environment requires patching (see /maniskill2_patches for details).

For further information, please refer to the CoTPC repository and official ManiSkill2 documentation.

Training & Evaluation

For key state discovery, which involves a differentiable mutual information estimator, we utilize the off-the-shelf InfoNet. The parameters of InfoNet are kept frozen. Download the pretrained InfoNet model and place the checkpoint in your directory. Then, update the checkpoint path in /src/infer_infonet.py with your own path.

The script /src/concept_train.py provides an example of key state discovery and saves the trained key state localization network. After training, use /src/concept_eval.py to label key states from the demonstrations and store the key state labels in a .pkl file.

 python /src/concept_train.py

After obtaining the automatically labeled key states, we use them to train a manipulation policy for each task. We build on Chain-of-Thought Predictive Control (CoTPC) as the foundation of our policy, which simultaneously optimizes both key state prediction and next action prediction. To train the policy, use /src/train.py, and to evaluate the performance of the trained policy, use /src/eval.py. For detailed examples of training and testing, refer to /scripts/train.sh and /scripts/eval.sh.

 bash /scripts/train.sh

Acknowledgement

We would like to express our gratitude to CoTPC and InfoNet for providing the code base that significantly assisted in the development of our program.

About

A Maximal Mutual Information Criterion for Manipulation Concept Discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published