- All baselines were trained using 8 GPU with a batch size of 8 (1 images per GPU) using the linear scaling rule to scale the learning rate.
- All models were trained on
cityscapes_train
, and tested oncityscapes_val
. - 1x training schedule indicates 64 epochs which corresponds to slightly less than the 24k iterations reported in the original schedule from the Mask R-CNN paper
- All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
Download links and more models with different backbones and training schemes will be added to the model zoo.
Backbone | Style | Lr schd | Scale | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
---|---|---|---|---|---|---|---|---|
R-50-FPN | pytorch | 1x | 800-1024 | 4.9 | 0.345 | 8.8 | 36.0 | model |
Backbone | Style | Lr schd | Scale | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | mask AP | Download |
---|---|---|---|---|---|---|---|---|---|
R-50-FPN | pytorch | 1x | 800-1024 | 4.9 | 0.609 | 2.5 | 37.4 | 32.5 | model |
Notes:
- In the original paper, the mask AP of Mask R-CNN R-50-FPN is 31.5.