Skip to content

Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Following" (ICCV 2021). We address the task of long horizon instruction following with a modular architecture that decouples a task into visual perception and action policy prediction.

License

Notifications You must be signed in to change notification settings

gistvision/moca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MOCA

MOCA: A Modular Object-Centric Approach for Interactive Instruction Following
Kunal Pratap Singh* , Suvaansh Bhambri* , Byeonghwi Kim* , Roozbeh Mottaghi , Jonghyun Choi

MOCA (Modular Object-Centric Approach) is a modular architecture that decouples a task into visual perception and action policy. The action policy module (APM) is responsiblefor sequential action prediction, whereas the visual perception module (VPM) generates pixel-wise interaction maskfor the objects of interest for manipulation.

MOCA

Environment

Clone repo

$ git clone https://github.com/gistvision/moca.git moca
$ export ALFRED_ROOT=$(pwd)/moca

Install requirements

$ virtualenv -p $(which python3) --system-site-packages moca_env
$ source moca_env/bin/activate

$ cd $ALFRED_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Download

Dataset

Dataset includes visual features extracted by ResNet-18 with natural language annotations. For details of the ALFRED dataset, see the repository of ALFRED.

$ cd $ALFRED_ROOT/data
$ sh download_data.sh

Pretrained MOCA

We provide our pretrained weight used for the experiments for the paper. To download the pretrained weight of MOCA, use the command below.

$ cd $ALFRED_ROOT/exp/pretrained
$ sh download_pretrained_weight.sh

Note: Note that this includes expert trajectories with both original and color-swapped frames.

Training

To train MOCA, run train_seq2seq.py with hyper-parameters below.

python models/train/train_seq2seq.py --data <path_to_dataset> --model seq2seq_im_mask --dout <path_to_save_weight> --splits data/splits/oct21.json --gpu --batch <batch_size> --pm_aux_loss_wt <pm_aux_loss_wt_coeff> --subgoal_aux_loss_wt <subgoal_aux_loss_wt_coeff> --preprocess

For example, if you want train MOCA and save the weights for all epochs in "exp/moca" with all hyperparameters used in the experiments in the paper, you may use the command below.

python models/train/train_seq2seq.py --dout exp/moca --gpu --save_every_epoch

Note: As mentioned in the repository of ALFRED, run with --preprocess only once for preprocessed json files.
Note: All hyperparameters used for the experiments in the paper are set as default. Note: The option, --save_every_epoch, saves weights for all epochs and therefore could take a lot of space.

Evaluation

To evaluate MOCA, run eval_seq2seq.py with hyper-parameters below.

python models/eval/eval_seq2seq.py --data <path_to_dataset> --model models.model.seq2seq_im_mask --model_path <path_to_weight> --eval_split <eval_split> --gpu --num_threads <thread_num>

If you want to evaluate our pretrained model saved in exp/pretrained/pretrained.pth in the seen validation, you may use the command below.

python models/eval/eval_seq2seq.py --model_path "exp/pretrained/pretrained.pth" --eval_split valid_seen --gpu --num_threads 4

Note: All hyperparameters used for the experiments in the paper are set as default.

Submission

To evaluate MOCA, run eval_seq2seq.py with hyper-parameters below.

python models/eval/leaderboard.py --model_path  --num_threads 4

If you want to submit our pretrained model, "exp/pretrained/pretrained.pth", to the leaderboard, you may use the command below.

python models/eval/leaderboard.py --model_path "exp/pretrained/pretrained.pth" --num_threads 4

Note: All hyperparameters used for the experiments in the paper are set as default.

Hardware

Trained and Tested on:

  • GPU - GTX 2080 Ti (12GB)
  • CPU - Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  • RAM - 32GB
  • OS - Ubuntu 18.04

License

MIT License

Citation

@article{MOCA21,
  title ={{MOCA: A Modular Object-Centric Approach for Interactive Instruction Following}},
  author={{Kunal Pratap Singh* and Suvaansh Bhambri* and Byeonghwi Kim*} and Roozbeh Mottaghi and Jonghyun Choi},
  journal = {arXiv},
  year = {2021},
  url  = {https://arxiv.org/abs/}
}

About

Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Following" (ICCV 2021). We address the task of long horizon instruction following with a modular architecture that decouples a task into visual perception and action policy prediction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published