Predictive-Dynamic-Fusion

This is the official implementation for Predictive Dynamic Fusion (ICML 2024) by Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, and Qinghua Hu.

Abstract

Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability. To address this issue, we propose a predictive dynamic fusion (PDF) framework for multimodal learning. We proceed to reveal the multimodal fusion from a generalization perspective and theoretically derive the predictable Collaborative Belief (Co-Belief) with Mono- and Holo-Confidence, which provably reduces the upper bound of generalization error. Accordingly, we further propose a relative regularization strategy to calibrate the predicted Co-Belief for potential uncertainty. Extensive experiments on multiple benchmarks confirm our superiority.

Environment Installation

numpy==1.21.6
Pillow==9.4.0
pytorch_pretrained_bert==0.6.2
scikit_learn==1.0.2
torch==1.11.0+cu113
torchvision==0.12.0+cu113
tqdm==4.65.0

Dataset preparation

Step 1: Download food101 and MVSA_Single and put them in the folder datasets.

Step 2: Prepare the train/dev/test splits jsonl files. We follow the QMF settings and provide them in corresponding folders.

Step 3 (optional): If you want use Glove model for Bow model, you can download glove.840B.300d.txt and put it in the folder datasets/glove_embeds. For bert model, you can download bert-base-uncased and put in the root folder bert-base-uncased/.

Train

bash ./shells/batch_train_latefusion_pdf.sh

Tips: at the beginning of training, the output value of the confidence predictor may be minimal when batch size is small, and taking the log may be nan, which can be solved by reducing the learning rate or increasing the weight decay.

Test

bash ./shells/batch_test_latefusion_pdf.sh

Citation

@article{cao2024predictive,
  title={Predictive Dynamic Fusion},
  author={Cao, Bing and Xia, Yinan and Ding, Yi and Zhang, Changqing and Hu, Qinghua},
  journal={arXiv preprint arXiv:2406.04802},
  year={2024}
}

Acknowledgement

The code is inspired by Provable Dynamic Fusion for Low-Quality Multimodal Data.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
datasets		datasets
shells		shells
src		src
LICENSE		LICENSE
README.md		README.md
frame.png		frame.png
requirements.txt		requirements.txt
test_pdf.py		test_pdf.py
train_pdf.py		train_pdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive-Dynamic-Fusion

Abstract

Environment Installation

Dataset preparation

Train

Test

Citation

Acknowledgement

About

Releases

Packages

Languages

License

Yinan-Xia/PDF

Folders and files

Latest commit

History

Repository files navigation

Predictive-Dynamic-Fusion

Abstract

Environment Installation

Dataset preparation

Train

Test

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages