Before using the coarse-to-fine association mechanism implementation, you need to first download the following pre-trained models and organise them as follow:
- weights/vilt_200k_mlm_itm.ckpt Google Drive
- configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/model_final_280758.pkl Google Drive
- configs/COCO-PanopticSegmentation/model_final_cafdb1.pkl Google Drive
Next, install the detectron2 platform:
cd ..
python -m pip install -e TextFusion-Association_for_Training-main
Finally, you are ready to use the vision&text modalities in the "input" folder for generating the association maps that will be saved in the "output" folder.
python main_genAssociation.py
Python 3.7.3
torch 1.9.0+cu111
pytorch-lightning 1.1.4
timm 0.9.12
scipy 1.7.3
opencv-python 4.10.0.84