We recommend using these caffe models with py-RFCN-priv
we are releasing the training code and files, the models and more experiments will come soon.
1. PSPNet training on SBD (10,582 images) and testing on VOC 2012 validation (1,449 images).
Network | mIoU(%) | pixel acc(%) | training speed |
training memory |
testing speed |
testing memory |
---|---|---|---|---|---|---|
resnet101-v2 | 77.94 | 94.94 | 1.6 img/s | 8,023MB | 3.0 img/s | 4,071MB |
resnet101-v2-selu | 77.10 | 94.80 | 1.6 img/s | 8,017MB | 3.0 img/s | 4,065MB |
resnext101-32x4d | 77.79 | 94.92 | 1.3 img/s | 8,891MB | 2.6 img/s | 5,241MB |
air101 | 77.64 | 94.93 | 1.3 img/s | 10,017MB | 2.5 img/s | 5,241MB |
inception-v4 | 77.58 | 94.83 | -- img/s | --MB | -- img/s | --MB |
se-resnet50 | 75.80 | 94.30 | -- img/s | --MB | -- img/s | --MB |
- To reduce memory usage, we merge all the models batchnorm layer parameters into scale layer, more details please refer to faster-rcnn-resnet or pva-faster-rcnn;
- PSP module without batch normlization, the kernel_size of avepooling is 64, 32, 16 and 8 respectively;
- All the models use 513x513 input with random crop, multi-scale traing (0.75x, 1.0x, 1.25x, 1.5x, 2.0x) and horizantal flipping;
- The training and testing speed is calculated on a single Nvidia Titan pascal GPU with batch_size=1;
- Training batch_size=16 for 2,0000 iterations, base_lr=0.001 with 'poly' learning rate policy (power=0.9);
- Testing with single scale, base_size=555 and crop_size=513, no flipping, no crf;