We recommend using these caffe models with py-RFCN-priv
we are releasing the training code and files, the models and more experiments will come soon.
Network | mAP@50(%) | training speed |
training memory |
testing speed |
testing memory |
---|---|---|---|---|---|
resnet18 | 70.02 | 9.5 img/s | 1,235MB | 17.5 img/s | 989MB |
resnet101 | -- | -- | -- | -- | -- |
resnet101-v2 | 79.6 | 3.1 img/s | 6,495MB | 7.1 img/s | 4,573MB |
resnet152-v2 | 80.72 | 2.8 img/s | 9,315MB | 6.2 img/s | 6,021MB |
wrn50-2 | 78.59 | 2.1 img/s | 4,895MB | 4.9 img/s | 3,499MB |
resnext50-32x4d | 77.99 | 3.6 img/s | 5,315MB | 7.4 img/s | 4,305MB |
resnext101-32x4d | 79.98 | 2.7 img/s | 7,836MB | 6.3 img/s | 5,705MB |
resnext101-64x4d | 80.71 | 2.0 img/s (batch=96) |
11,277MB | 3.7 img/s | 9,461MB |
inception-v3 | 78.6 | 4.1 img/s | 4,325MB | 7.3 img/s | 3,445MB |
xception | 76.6 | 3.3 img/s | 7,341MB | 7.8 img/s | 2,979MB |
inception-v4 | 81.49 | 2.6 img/s | 6,759MB | 5.4 img/s | 4,683MB |
inception-resnet-v2 | 80.0 | 2.0 img/s (batch=112) |
11,497MB | 3.2 img/s | 8,409MB |
densenet-161 | -- | -- | -- | -- | -- |
densenet-201 | 77.53 | 3.9 img/s (batch=72) |
10,073MB | 5.5 img/s | 9,955MB |
resnet38a | 80.1 | 1.4 img/s | 8,723MB | 3.4 img/s | 5,501MB |
air101 | 81.0 | 2.4 img/s | 7,747MB | 5.1 img/s | 5,777MB |
- To reduce memory usage, we merge all the models batchnorm layer parameters into scale layer, more details please refer to faster-rcnn-resnet or pva-faster-rcnn;
- We also split the deploy file to rpn deploy file and rcnn deploy file for adopting more testing tricks.
- Performanc, speed and memory are calculated on py-RFCN-priv with Nvidia Titan pascal, we do not guarantee that the results can be reproduced under any other conditions;
- All the models are trained on a single scale (600*1000) with image flipping and train-batch=128 for 80,000 iterations, tested on the same single scale with test-batch=300 and nms=0.3;
Comparisons on VOC 2007 test using faster rcnn with inception-v4.
Method | mAP@50 | improvment | test speed |
---|---|---|---|
baseline inception-v4 | 81.49 | -- | 5.4 img/s |
+multi-scale training | 83.79 | 2.30 | 5.4 img/s |
+box voting | 83.95 | 0.16 | 5.4 img/s |
+nms=0.4 | 84.22 | 0.27 | 5.4 img/s |
+image flipping test | 84.54 | 0.32 | 2.7 img/s |
+multi-scale testing | 85.78 | 1.24 | 0.13 img/s |
- The SCALES for multi-scale training is (200, 400, 600, 800, 1000) and MAX_SIZE is 1666;
- For multi-scale training, we double the training iterations (160000 for VOC0712trainval);
- The SCALES for multi-scale testing is (400, 600, 800, 1000, 1200) and MAX_SIZE is 2000;
Network | mAP@50(%) | training speed |
training memory |
testing speed |
testing memory |
---|---|---|---|---|---|
resnet101-v2 w/o OHEM | 80.18 | 5.4 img/s | 5,807MB | 10.5 img/s | 3,147MB |
resnet101-v2 | 80.6 | 5.0 img/s | 5,833MB | 10.5 img/s | 3,147MB |
resnet101-v2-multigrid | 80.49 | 5.0 img/s | 5,833MB | 10.5 img/s | 3,147MB |
air101-multigrid | 81.47 | 3.4 img/s | 6,653MB | 8.7 img/s | 4,503MB |
air101-multigrid-context | 82.09 | 3.3 img/s | 6,773MB | 8.6 img/s | 4,577MB |
air101-fpn w/o OHEM | 81.44 | 2.4 img/s | 7,063MB | 3.8 img/s | 4,433MB |
inception-v4-3x3 | 81.12 | 3.73 img/s | 5,383MB | 10.1 img/s | 3,217MB |
inception-v4-3x3-multigrid | 81.30 | 3.73 img/s | 5,383MB | 10.1 img/s | 3,217MB |
Network | mAP@50(%) | training speed |
training memory |
testing speed |
testing memory |
---|---|---|---|---|---|
resnet18 | 71.82 | 14.3 img/s | 1,215MB | 23.4 img/s | 899MB |
resnext26-32x4d | 72.07 | 7.5 img/s | 2,521MB | 15.0 img/s | 1,797MB |
resnet101-v2 | 78.93(79.9) | 4.9 img/s | 5,719MB | 10.4 img/s | 3,097MB |
resnext101-32x4d | 79.98(80.35) | 3.8 img/s | 6,977MB | 8.8 img/s | 4,761MB |
resnext101-64x4d | 80.26(79.88) | 2.4 img/s | 10,203MB | 6.2 img/s | 8,529MB |
air101 | 79.42(80.93) | 3.4 img/s | 6,525MB | 8.5 img/s | 4,477MB |
air152 | ..(81.18) | .. | .. | .. | .. |
inception-v4 | 80.2 | 4.1 img/s | 4,371MB | 10.3 img/s | 2,343MB |
inception-v4-3x3 | 81.15 | 3.7 img/s | 5,207MB | 9.5 img/s | 3,151MB |
se-inception-v2 | 77.1 | .. | .. | .. | .. |
- The mAP@50 score in parentheses is training with ohem and multigrid;
Network | mAP | mAP@50 | mAP@75 | mAP@S | mAP@M | mAP@L |
---|---|---|---|---|---|---|
RFCN-se-inception-v2 with ms-train & ohem & multigrid |
32.6 | 53.6 | 34.5 | 12.5 | 35.1 | 48.4 |
RFCN-se-inception-v2 with ms-train & ohem & multigrid & bbox voting & soft-nms & flipping & ms-test |
36.8 | 59.8 | 38.7 | 19.7 | 39.8 | 49.1 |
FPN-Faster-inception-v4 with ms-train |
36.5 | 58.5 | 38.8 | 16.5 | 38.8 | 52.1 |
FPN-Faster-inception-v4 with ms-train & bbox voting & soft-nms |
38.3 | 61.0 | 40.8 | 20.0 | 41.5 | 51.4 |
FPN-Faster-inception-v4 with ms-train & bbox voting & soft-nms & flipping & ms-test |
39.5 | 62.5 | 42.3 | 23.3 | 43.2 | 51.0 |
RFCN-air101 with ms-train & ohem & multigrid |
38.2 | 60.1 | 41.2 | 18.2 | 41.9 | 53.0 |
RFCN-air101 with extra 7 epochs & ms-train & ohem & multigrid |
38.5 | 60.2 | 41.4 | 18.3 | 42.1 | 53.4 |
RFCN-air101 with ms-train & ohem & multigrid & bbox voting & soft-nms & flipping |
40.4 | 63.5 | 43.5 | 22.6 | 44.4 | 52.0 |
RFCN-air101 with ms-train & ohem & multigrid & bbox voting & soft-nms & flipping & ms-test |
41.8 | 65.3 | 45.3 | 26.1 | 45.6 | 52.4 |
RFCN-air101 with ms-train & ohem & multigrid & bbox voting & soft-nms & flipping & assign-ms-test |
42.1 | 64.6 | 45.6 | 25.6 | 44.5 | 54.1 |
RFCN-air101 with ms-train & ohem & multigrid & deformpsroi & bbox voting & soft-nms & flipping & assign-ms-test |
43.2 | 66.0 | 46.7 | 25.6 | 46.3 | 55.9 |
Faster-2fc-air101 with ms-train & ohem & multigrid |
36.5 | 60.4 | 38.1 | 15.5 | 39.5 | 53.5 |