Relation DETR

By Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan.

This repo is the official implementation of Relation DETR: Exploring Explicit Position Relation Prior for Object Detection accepted to ECCV2024 (score 5444, oral presentation). [Arxiv paper link], [论文介绍], [代码讲解]

💖 If our Relation-DETR or SA-Det-100k dataset is helpful to your researches or projects, please star this repository. Thanks! 🤗

TODO

...Want more features? Open a Feature Request.

Support data augmentations from albumentations.
Support Mosaic and Mixup data augmentation.
More detailed docs for the code.
Add a instruction about introducing our relation to other models.
Support GradCam and feature visualization.
Upload more pretrained weights and training logs.
Update visualization code for MC.
Update Model ZOO.

Update

[2024-09-08] Relation-DETR (Focal-Large) checkpoint pretrained on Object365 is now available here.
[2024-08-15] We release config and checkpoint of DINO++ (DINO enhanced by our position relation).
[2024-08-12] Relation-DETR is selected for Oral presentation in ECCV2024!
[2024-08-11] The pretrained weight for Relation-DETR on SA-Det-100k are available here!
[2024-08-07] Relation-DETR with FocalNet-large achieves 63.5AP on COCO test-dev2017 after fine-tuned for 4 epochs on Object365, config and checkpoint are available now!
[2024-07-24] Upload SA-Det-100k dataset, see it in Hugging Face and Ai Studio.
[2024-07-18] Upload Relation-DETR training logs for pretrained weights.
[2024-07-18] We release the code for Relation-DETR, Relation-DETR with Swin-L achieves 58.1 AP!
[2024-03-26] Code for Salience-DETR is available here.
[2024-07-17] We release the checkpoint for Relation-DETR with ResNet-50 and Swin-L backbones, see Releases v1.0.0.
[2024-07-01] Relation-DETR is accepted to ECCV2024. Welcome to your attention!

SA-Det-100k

SA-Det-100k is a large-scale class-agnostic object detection dataset for Research Purposes only. The dataset is based on a subset of SA-1B (see LICENSE), and all objects belong to the same category objects. Because it contains a large number of scenarios but does not provide class-specific annotations, we believe it may be a good choice to pre-training models for a variety of downstream tasks with different categories. The dataset contains about 100k images, and each image is resized using opencv-python so that the larger one of their height and width is 1333, which is consistent with the data augmentation commonly used to train COCO. The dataset can be found in:

Model ZOO

COCO

Model	Backbone	Epoch	Download	mAP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
Relation DETR	ResNet50	12	config / checkpoint / log	51.7	69.1	56.3	36.1	55.6	66.1
Relation DETR	Swin-L_(IN-22K)	12	config / checkpoint	57.8	76.1	62.9	41.2	62.1	74.4
Relation DETR	ResNet50	24	config / checkpoint / log	52.1	69.7	56.6	36.1	56.0	66.5
Relation DETR	Swin-L_(IN-22K)	24	config / checkpoint / log	58.1	76.4	63.5	41.8	63.0	73.5
Relation-DETR^†	Focal-L_(IN-22K)	4+24	config / o365_checkpoint / checkpoint	63.5	80.8	69.1	47.2	66.9	77.0

† means finetuned model on COCO after pretraining on Object365.

[Other DETR variants:] We integrate our position relation into existing DETR variants and generate enhanced versions of them. Note some of these weights are newly trained and may produce slightly different results from those reported in our paper. We mark these variants with ++ in the name to distinguish them from their original versions.

Model	Backbone	Epoch	Download	mAP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
Deformable-DETR++	ResNet50	12	config	47.0	65.6	51.2	29.3	51.0	62.2
Dab-Def-DETR++	ResNet50	12	config / checkpoint	48.3	66.5	52.9	32.4	52.0	62.0
DN-Def-DETR++	ResNet50	12	config / checkpoint	47.3	65.6	51.4	29.9	50.8	62.1
DINO++	ResNet50	12	config / checkpoint	50.1	67.8	54.9	33.3	53.9	63.5

SA-Det-100k

Model	Backbone	Epoch	Download	mAP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
DINO with VFL	ResNet50	12	——	43.7	52.0	47.7	5.8	43.0	61.5
Relation DETR	ResNet50	12	config / checkpoint	45.0	53.1	48.9	6.0	44.4	62.9

Get started

1. Installation

We use the environment same as Salience-DETR. You can skip the step if you have run Salience-DETR.

Clone the repository:

```shell
git clone https://github.com/xiuqhou/Relation-DETR
cd Relation-DETR
```

Install Pytorch and torchvision:

```shell
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
```

Install other requirements:

```shell
pip install -r requirements.txt
```

2. Prepare datasets

Download COCO2017 (and SA-Det-100k optionally), put them in data/ following the structure:

data/
  ├─coco/
  │  ├── train2017/
  │  ├── val2017/
  │  └── annotations/
  │         ├── instances_train2017.json
  │         └── instances_val2017.json
  │
  └─sa_det_100k/
      ├── train2017/
      ├── val2017/
      └── annotations/

3. Evaluate pretrained models

To evaluate a model with one or more GPUs, specify CUDA_VISIBLE_DEVICES, dataset, model and checkpoint.

CUDA_VISIBLE_DEVICES=<gpu_ids> accelerate launch test.py --coco-path /path/to/coco --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth

For example, run the following shell to evaluate Relation-DETR with ResNet-50 (1x) on COCO, You can expect to get the final AP about 51.7.

CUDA_VISIBLE_DEVICES=0 accelerate launch test.py \
  --coco-path data/coco \
  --model-config configs/relation_detr/relation_detr_resnet50_800_1333.py \
  --checkpoint https://github.com/xiuqhou/Relation-DETR/releases/download/v1.0.0/relation_detr_resnet50_800_1333_coco_1x.pth

To export results to a json file, specify --result with a file name ended with .json.
To visualize predictions, specify --show-dir with a folder name. You can change the visualization style through --font-scale, --box-thick, --fill-alpha, --text-box-color, --text-font-color, --text-alpha parameters.

4. Evaluate exported json results

To evaluate a json results, specify `dataset` and `result`. The evaluation only needs CPU so you don't need to specify `CUDA_VISIBLE_DEVICES`.

accelerate launch test.py --coco-path /path/to/coco --result /path/to/result.json

To visualize predictions, specify --show-dir with a folder name. You can change the visualization style through --font-scale, --box-thick, --fill-alpha, --text-box-color, --text-font-color, --text-alpha parameters.

5. Train a model

Use CUDA_VISIBLE_DEVICES to specify GPU/GPUs and run the following script to start training. If not specified, the script will use all available GPUs on the node to train. Before start training, modify parameters in configs/train_config.py.

CUDA_VISIBLE_DEVICES=0 accelerate launch main.py    # train with 1 GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch main.py  # train with 2 GPUs

5. Benchmark a model

To test the inference speed, memory cost and parameters of a model, use tools/benchmark_model.py.

python tools/benchmark_model.py --model-config configs/relation_detr/relation_detr_resnet50_800_1333.py

6. Export an ONNX model

For advanced users who want to deploy our model, we provide a script to export an ONNX file.

python tools/pytorch2onnx.py \
    --model-config /path/to/model.py \
    --checkpoint /path/to/checkpoint.pth \
    --save-file /path/to/save.onnx \
    --simplify \  # use onnxsim to simplify the exported onnx file
    --verify  # verify the error between onnx model and pytorch model

For inference using the ONNX file, see ONNXDetector in tools/pytorch2onnx.py

License

Relation-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Bibtex

If you find our work helpful for your research, please consider citing:

@inproceedings{hou2024relation,
  title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
  author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong and Lan, Xuguang},
  booktitle={European conference on computer vision},
  year={2024},
  organization={Springer}
}

@InProceedings{Hou_2024_CVPR,
    author    = {Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong},
    title     = {Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {17574-17583}
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
configs		configs
datasets		datasets
images		images
models		models
optimizer		optimizer
tools		tools
transforms		transforms
util		util
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relation DETR

TODO

Update

SA-Det-100k

Model ZOO

COCO

SA-Det-100k

Get started

License

Bibtex

About

Releases 2

Packages

Languages

License

xiuqhou/Relation-DETR

Folders and files

Latest commit

History

Repository files navigation

Relation DETR

TODO

Update

SA-Det-100k

Model ZOO

COCO

SA-Det-100k

Get started

License

Bibtex

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages