Track 2 is a multi task track for building instance segmentation and height prediction, focusing on detecting individual buildings, building instance segmentation, and height information. This track has a large number of neighboring building clusters, which challenges the participants' ability to involve the algorithmic process.
We provide individual RGB and SAR (Synthetic Aperture Radar) remote sensing images. For better use of multi-modal data, we provide a python script to generate 4-channel images concatenated in the channel dimension, in 4-channel (R,G,B,SAR) tif format. You can run the following command to generate the merge direcory:
python ./make_merge.py $DATASET_ROOT
All the images are in size of 512x512. In the instance segmentation sub-task, the data format follows the MS COCO format, and the annotation uses the json format. In the height prediction sub-task, the data annotation adopts the height ground true formed by the pixel by pixel elevation value corresponding to the RGB images, and the data uses tif format.
The topology of the dataset directory is as follows:
```
DFC_Track_2
├── annotations
│ └── buildings_only_train.json
│ └── buildings_only_val.json
│ └── buildings_only_test.json
├── train
│ └── rgb
│ │ ├── P0001.tif
│ │ └── ...
│ │ └── P0009.tif
│ └── sar
│ │ ├── P0001.tif
│ │ └── ...
│ │ └── P0009.tif
│ └── height
│ ├── P0001.tif
│ └── ...
│ └── P0009.tif
├── val
│ └── rgb
│ │ ├── P0011.tif
│ │ └── ...
│ │ └── P0019.tif
│ └── sar
│ │ ├── P0011.tif
│ │ └── ...
│ │ └── P0019.tif
│ └── height
│ ├── P0011.tif
│ └── ...
│ └── P0019.tif
└── test
└── rgb
│ ├── P0021.tif
│ └── ...
│ └── P0029.tif
└── sar
│ ├── P0021.tif
│ └── ...
│ └── P0029.tif
└── height
├── P0021.tif
└── ...
└── P0029.tif
```
We choose the classical mask rcnn with multimodal multitask learning (height prediction) framework as the contest baseline model. Among the input image modalities are RGB and SAR.
We use MMDetection (version 2.25.1) to test the baseline model performance.
The performance report of multimodal multitask learning framework on the validation set of track 2 (instance segmentation and height prediction) is as follows:
Model | Modality | mAP | mAP_50 | Delta_1 | Delta_2 | Delta_3 |
---|---|---|---|---|---|---|
our baseline | RGB+SAR | 15.1 | 41.1 | 30.1 | 35.3 | 39.6 |
The documents submitted should be a folder. The topology of the submitted floder directory is as follows:
```
DFC2023_Track_2_submit
├── seg_results.json
└── height
├── P0011.tif
└── P0012.tif
└── ...
└── P0019.tif
```