This is official Pytorch implementation of "SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness"
The overall framework of the proposed SuperFusion for cross-modal image registration and fusion.
The architecture of dense matcher, which consists of a pyramid feature extractor and iterative flow estimators. Flows are estimated in three scales iteratively and summed up.
Architecture of the fusion network
The schematic illustration of the global spatial attention module (GSAM). The global attention is calculated by adapting a spatial RNN to aggregate the spatial context in four directions.
- torch 1.10.1
- torchvision 0.11.2
- kornia 0.6.5
- opencv 4.5.5
- pillow 9.2.0
python test.py --mode=Reg --dataset_name=MSRS
python test.py --mode=Reg --dataset_name=RoadScene
python test.py --mode=Fusion --dataset_name=MSRS
python test.py --mode=Fusion --dataset_name=RoadScene
python test.py --mode=Reg&Fusion --dataset_name=MSRS
python test.py --mode=Reg&Fusion --dataset_name=RoadScene
We suggest using our pre-trained model to test SuperFusion.
First, you need to download the train set from https://github.com/Linfeng-Tang/MSRS/tree/main/train and place '/MSRS/train/ir' and '/MSRS/train/vi' in './dataset/train/MSRS/ir' and '. /dataset/train/MSRS/vi', respectively.
python train.py --dataroot=./dataset/train/MSRS --n_ep=1000 --n_ep_decay=800 --resume=./checkpoint/MSRS.pth --stage=RF
You can download the RoadScene dataset from https://github.com/hanna-xu/RoadScene, and put the infrared and visible images into './dataset/train/RoadScene/ir' and '. /dataset/train/RoadScene/vi' for training.
python train.py --dataroot=./dataset/train/RoadScene --n_ep=1000 --n_ep_decay=800 --resume=./checkpoint/RoadScene.pth --stage=RF
python train.py --dataroot=./dataset/train/MSRS --n_ep=2000 --n_ep_decay=1600 --resume=./checkpoint/MSRS.pth --stage=FS
Quantitative registration performance on MSRS and RoadScene. Mean reprojection error (RE) and end-point error (EPE) are reported.
Qualitative registration performance of DASC, RIFT, GLU-Net, UMF-CMGR, CrossRAFT, and our SuperFusion. The first four rows of images are from the MSRS dataset, and the last two are from the RoadScene dataset. The purple textures are the gradients of registered infrared images and the backgrounds are the corresponding ground truths. The discriminateive regions that demonstrate the superiority of our method are highlighted in boxes. Note that, the gradients of the second column images are from the warped images, i.e., the misaligned infrared images.
Quantitative comparison results of SuperFusion with five state-of-the-art alternatives on
Quantitative comparison results of SuperFusion with five state-of-the-art alternatives on
Qualitative comparison results of SuperFusion with five state-of-the-art infrared and visible image fusion methods on the MSRS and RoadScene datasets. All methods employ the built-in registration module (e.g., UMF-CMGR and our SuperFusion) or CrossRAFT to register the source images.
Segmentation performance (IoU) of visible, infrared, and fused images on the MSRS dataset.
Segmentation results for source images and fused images from the MSRS dataset. The fused image indicates the fusion result generated by our SuperFusion, and the pre-trained segmentation model is provided by SeAFusion.
@article{TANG2022SuperFusion,
title={SuperFusion: A versatile image registration and fusion network with semantic awareness},
author={Tang, Linfeng and Deng, Yuxin and Ma, Yong and Huang, Jun and Ma, Jiayi},
journal={IEEE/CAA Journal of Automatica Sinica},
volume={9},
number={12},
pages={2121--2137},
year={2022},
publisher={IEEE}
}