Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page.
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen,
Jiayuan Mao,
Jiajun Wu,
Kwan-Yee K. Wong,
Joshua B. Tenenbaum, and
Chuang Gan
- Python 3
- PyTorch 1.0 or higher, with NVIDIA CUDA Support
- Other required python packages specified by
requirements.txt
. See the Installation.
Install Jacinle: Clone the package, and add the bin path to your global PATH
environment variable:
git clone https://github.com/vacancy/Jacinle --recursive
export PATH=<path_to_jacinle>/bin:$PATH
Clone this repository:
git clone https://github.com/zfchenUnique/DCL-Release.git --recursive
Create a conda environment for NS-CL, and install the requirements. This includes the required python packages
from both Jacinle NS-CL. Most of the required packages have been included in the built-in anaconda
package:
- Download videos, video annotation, questions and answers, and object proposals accordingly from the official website
- Transform videos into ".png" frames with ffmpeg.
- Organize the data as shown below.
clevrer ├── annotation_00000-01000 │ ├── annotation_00000.json │ ├── annotation_00001.json │ └── ... ├── ... ├── image_00000-01000 │ │ ├── 1.png │ │ ├── 2.png │ │ └── ... │ └── ... ├── ... ├── questions │ ├── train.json │ ├── validation.json │ └── test.json ├── proposals │ ├── proposal_00000.json │ ├── proposal_00001.json │ └── ...
- Download the extracted object trajectories from google drive.
- Git clone the dynamic model, download image proposals and the pretrained propNet models and make dynamic prediction by
git clone https://github.com/zfchenUnique/clevrer_dynamic_propnet.git
cd clevrer_dynamic_propnet
sh ./scripts/eval_fast_release_v2.sh 0
- Download the pretrained DCL model and parsed programs.
sh scripts/script_test_prp_clevrer_qa.sh 0
- Get the accuracy on evalAI.
- Step 1: download the proposals from the region proposal network and extract object trajectories for train and val set by
sh scripts/script_gen_tubes.sh
- Step 2: train a concept learner with descriptive and explanatory questions for static concepts (i.e. color, shape and material)
sh scripts/script_train_dcl_stage1.sh 0
- Step 3: extract static attributes & refine object trajectories extract static attributes
sh scripts/script_extract_attribute.sh
refine object trajectories
sh scripts/script_gen_tubes_refine.sh
- Step 4: extract predictive and counterfactual scenes by
cd clevrer_dynamic_propnet
sh ./scripts/train_tube_box_only.sh # train
sh ./scripts/train_tube.sh # train
sh ./scripts/eval_fast_release_v2.sh 0 # val
- Step 5: train DCL with all questions and the refined trajectories
sh scripts/script_train_dcl_stage2.sh 0
- Step 1: download expression annotation and parsed programs from google drive
- Step 2: evaluate the performance on CLEVRER-Grounding
sh ./scripts/script_grounding.sh 0
jac-crun 0 scripts/script_evaluate_grounding.py
- Step 1: download expression annotation and parsed programs from google drive
- Step 2: evaluate the performance on CLEVRER-Retrieval
sh ./scripts/script_retrieval.sh 0
jac-crun 0 scripts/script_evaluate_retrieval.py
- Step 1: download question annotation from google drive and videos from dropbox under the UETorch repo.
- Step 2: train on Tower block QA
sh ./scripts/script_train_blocks.sh 0
- Step 3: download the pretrain model from google drive and evaluate on Tower block QA
sh ./scripts/script_eval_blocks.sh 0
- Qualitative Results
- CLEVRER-Grounding training set Annotation
- CLEVRER-Retrieval training set Annotation
- Project Page
If you find this repo useful in your research, please consider citing:
@inproceedings{zfchen2021iclr,
title={Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning},
author={Chen, Zhenfang and Mao, Jiayuan and Wu, Jiajun and Wong, Kwan-Yee~K. and Tenenbaum, Joshua B. and Gan, Chuang},
booktitle={International Conference on Learning Representations},
year={2021}
}