Name	Name	Last commit message	Last commit date
Latest commit History 84 Commits
data	data
env	env
gen	gen
models	models
scripts	scripts
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
download_model.sh	download_model.sh
moca.png	moca.png
requirements.txt	requirements.txt

MOCA

Factorizing Perception and Policy for Interactive Instruction Following
Kunal Pratap Singh* , Suvaansh Bhambri* , Byeonghwi Kim* , Roozbeh Mottaghi , Jonghyun Choi
ICCV 2021

MOCA (Modular Object-Centric Approach) is a modular architecture that decouples a task into visual perception and action policy. The action policy module (APM) is responsible for sequential action prediction, whereas the interactive perception module (IPM) generates pixel-wise interaction mask for the objects of interest for manipulation. MOCA addresses long-horizon instruction following tasks based on egocentric RGB observations and natural language instructions on the ALFRED benchmark.
(Prev: MOCA: A Modular Object-Centric Approach for Interactive Instruction Following)

Environment

Clone repository

$ git clone https://github.com/gistvision/moca.git moca
$ export ALFRED_ROOT=$(pwd)/moca

Install requirements

$ virtualenv -p $(which python3) --system-site-packages moca_env
$ source moca_env/bin/activate

$ cd $ALFRED_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Download

Dataset

We are currently working on the release of our dataset with the original ResNet features and ones with data augmentation. We will update here when it's available.

Pretrained Model

We will update the script to provide a pretrained model used for the paper.

Training

To train MOCA, run train_seq2seq.py with hyper-parameters below.

python models/train/train_seq2seq.py --data <path_to_dataset> --model seq2seq_im_mask --dout <path_to_save_weight> --splits data/splits/oct21.json --gpu --batch <batch_size> --pm_aux_loss_wt <pm_aux_loss_wt_coeff> --subgoal_aux_loss_wt <subgoal_aux_loss_wt_coeff> --preprocess

Note: As mentioned in the repository of ALFRED, run with --preprocess only once for preprocessed json files.
Note: All hyperparameters used for the experiments in the paper are set as default.

For example, if you want train MOCA and save the weights for all epochs in "exp/moca" with all hyperparameters used in the experiments in the paper, you may use the command below.

python models/train/train_seq2seq.py --dout exp/moca --gpu --save_every_epoch

Note: The option, --save_every_epoch, saves weights for all epochs and therefore could take a lot of space.

Evaluation

Task Evaluation

To evaluate MOCA, run eval_seq2seq.py with hyper-parameters below.
To evaluate a model in the seen or unseen environment, pass valid_seen or valid_unseen to --eval_split.

python models/eval/eval_seq2seq.py --data <path_to_dataset> --model models.model.seq2seq_im_mask --model_path <path_to_weight> --eval_split <eval_split> --gpu --num_threads <thread_num>

Note: All hyperparameters used for the experiments in the paper are set as default.

If you want to evaluate our pretrained model saved in exp/pretrained/pretrained.pth in the seen validation, you may use the command below.

python models/eval/eval_seq2seq.py --model_path "exp/pretrained/pretrained.pth" --eval_split valid_seen --gpu --num_threads 4

Subgoal Evaluation

To evaluate MOCA for subgoals, run eval_seq2seq.py with with the option --subgoals <subgoals>.
The option takes all for all subgoals and GotoLocation, PickupObject, PutObject, CoolObject, HeatObject, CleanObject, SliceObject, and ToggleObject for each subgoal. The option can take multiple subgoals. For more details, refer to ALFRED.

python models/eval/eval_seq2seq.py --data <path_to_dataset> --model models.model.seq2seq_im_mask --model_path <path_to_weight> --eval_split <eval_split> --gpu --num_threads <thread_num> --subgoals <subgoals>

Note: All hyperparameters used for the experiments in the paper are set as default.

If you want to evaluate our pretrained model saved in exp/pretrained/pretrained.pth in the seen validation for all subgoals, you may use the command below.

python models/eval/eval_seq2seq.py --model_path "exp/pretrained/pretrained.pth" --eval_split valid_seen --gpu --num_threads 4 --subgoals all

Expected Validation Result

This will be updated soon.

Hardware

Trained and Tested on:

GPU - GTX 2080 Ti (11GB)
CPU - Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
RAM - 32GB
OS - Ubuntu 18.04

License

MIT License

Citation

@article{singh2020moca,
  title={Factorizing Perception and Policy for Interactive Instruction Following},
  author={Singh, Kunal Pratap and Bhambri, Suvaansh and Kim, Byeonghwi and Mottaghi, Roozbeh and Choi, Jonghyun},
  journal={arXiv preprint arXiv:2012.03208},
  year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOCA

Environment

Clone repository

Install requirements

Download

Dataset

Pretrained Model

Training

Evaluation

Task Evaluation

Subgoal Evaluation

Expected Validation Result

Hardware

License

Citation

About

Releases

Packages

Contributors 2

Languages

License

gistvision/moca

Folders and files

Latest commit

History

Repository files navigation

MOCA

Environment

Clone repository

Install requirements

Download

Dataset

Pretrained Model

Training

Evaluation

Task Evaluation

Subgoal Evaluation

Expected Validation Result

Hardware

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages