DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection

Yishuo Chen¹, Boran Wang^{1([email protected])}✉️, Xinyu Guo¹, Wenbin Zhu¹, Jiasheng He¹, Xiaobin Liu¹ and Jing Yuan^1,2,3

1.College of Artificial Intelligence, Nankai University

2.Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University

3.Tianjin Key Laboratory of Intelligence Robotics, Nankai University

This repository is the code release of the paper "DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection". Pattern Recognition. ICPR 2024.

Introduction

We design a dual-enhancement-based cross-modality object detection network DEYOLO, in which a semantic-spatial cross-modality module and a novel bi-directional decoupled focus module are designed to achieve the detection-centered mutual enhancement of RGB-infrared (RGB-IR). Specifically, a dual semantic enhancing channel weight assignment module (DECA) and a dual spatial enhancing pixel weight assignment module (DEPA) are firstly proposed to aggregate cross-modality information in the feature space to improve the feature representation ability, such that feature fusion can aim at the object detection task. Meanwhile, a dual-enhancement mechanism, including enhancements for two-modality fusion and single modality, is designed in both DECA and DEPA to reduce interference between the two kinds of image modalities. Then, a novel bi-directional decoupled focus is developed to enlarge the receptive field of the backbone network in different directions, which improves the representation quality of DEYOLO.

Document

Recommended Environment

torch 1.13.0
torchvision 0.14.0
numpy 1.25.0

conda create -n yolov8 python=3.9
pip install pypi
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -e .

Train

You can choose DEYOLO's n/s/m/l/x model in DEYOLO.yaml

from ultralytics import YOLO

# Load a model
model = YOLO("ultralytics/models/v8/DEYOLO.yaml").load("yolov8n.pt")

# Train the model
train_results = model.train(
    data="M3FD.yaml",  # path to dataset YAML
    epochs=100,  # number of training epochs
    imgsz=640,  # training image size
    device="cpu",  # device to run on, i.e. device=0 or device=0,1,2,3 or device=cpu
)

Predict

from ultralytics import YOLO

# Load a model
model = YOLO("DEYOLOn.pt") # trained weights

# Perform object detection on RGB and IR image
model.predict([["ultralytics/assets/vi_1.png", "ultralytics/assets/ir_1.png"], # corresponding image pair
              ["ultralytics/assets/vi_2.png", "ultralytics/assets/ir_2.png"]], 
              save=True, imgsz=320, conf=0.5)

Dataset

Like M3FD.yaml and LLVIP.yaml You can use your own dataset.

File structure

Your dataset
├── ...
├── images
|   ├── vis_train
|   |   ├── 1.jpg
|   |   ├── 2.jpg
|   |   └── ...
|   ├── vis_val
|   |   ├── 1.jpg
|   |   ├── 2.jpg
|   |   └── ...
|   ├── Ir_train
|   |   ├── 100.jpg
|   |   ├── 101.jpg
|   |   └── ...
|   ├── Ir_val 
|   |   ├── 100.jpg
|   |   ├── 101.jpg
|   |   └── ...
└── labels
    ├── vis_train
    |   ├── 1.txt
    |   ├── 2.txt
    |   └── ...
    └── vis_val
        ├── 100.txt
        ├── 101.txt
        └── ...

You can download the dataset using the following link:

M3FD
LLVIP

Pipeline

The framework

We incorporate dual-context collaborative enhancement modules (DECA and DEPA) within the feature extraction streams dedicated to each detection head in order to refine the single-modality features and fuse multi-modality representations. Concurrently, the Bi-direction Decoupled Focus is inserted in the early layers of the YOLOv8 backbone to expand the network’s receptive fields.

DECA and DEPA

DECA enhances the cross-modal fusion results by leveraging dependencies between channels within each modality and outcomes are then used to reinforce the original single-modal features, highlighting more discriminative channels.

DEPA is able to learn dependency structures within and across modalities to produce enhanced multi-modal representations with stronger positional awareness.

Bi-direction Decoupled Focus

We divide the pixels into two groups for convolution. Each group focuses on the adjacent and remote pixels at the same time. Finally, we concatenate the original feature map in the channel dimension and make it go through a depth-wise convolution layer.

Visual comparison

Main Results

The mAP₅₀ and mAP₅₀₋₉₅ of every category in M³FD dataset demonstrate the superiority of our method.

Trained Weights：

Citation

If you use this code or ideas from the paper for your research, please cite our paper:

@InProceedings{Chen_2024_ICPR,
    author    = {Chen, Yishuo and Wang, Boran and Guo, Xinyu and Zhu, Senbin and He, Jiasheng and Liu, Xiaobin and Yuan, Jing},
    title     = {DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection},
    booktitle = {International Conference on Pattern Recognition},
    year      = {2024},
    pages     = {}
}

Acknowledgement

Part of the code is adapted from previous works: YOLOv8. We thank all the authors for their contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs		docs
examples		examples
imgs		imgs
ultralytics		ultralytics
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection

Yishuo Chen¹, Boran Wang^{1([email protected])}✉️, Xinyu Guo¹, Wenbin Zhu¹, Jiasheng He¹, Xiaobin Liu¹ and Jing Yuan^1,2,3

1.College of Artificial Intelligence, Nankai University

2.Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University

3.Tianjin Key Laboratory of Intelligence Robotics, Nankai University

Introduction

Document

Recommended Environment

Train

Predict

Dataset

Pipeline

The framework

DECA and DEPA

Bi-direction Decoupled Focus

Visual comparison

Main Results

Citation

Acknowledgement

About

Releases

Packages

Contributors 3

Languages

License

chips96/DEYOLO

Folders and files

Latest commit

History

Repository files navigation

DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection

Yishuo Chen1, Boran Wang1([email protected])✉️, Xinyu Guo1, Wenbin Zhu1, Jiasheng He1, Xiaobin Liu1 and Jing Yuan1,2,3

1.College of Artificial Intelligence, Nankai University

2.Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University

3.Tianjin Key Laboratory of Intelligence Robotics, Nankai University

Introduction

Document

Recommended Environment

Train

Predict

Dataset

Pipeline

The framework

DECA and DEPA

Bi-direction Decoupled Focus

Visual comparison

Main Results

Citation

Acknowledgement

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Yishuo Chen¹, Boran Wang^{1([email protected])}✉️, Xinyu Guo¹, Wenbin Zhu¹, Jiasheng He¹, Xiaobin Liu¹ and Jing Yuan^1,2,3

Packages