Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
gn.jpg	gn.jpg

Group Normalization for Mask R-CNN

Introduction

This file provides Mask R-CNN baseline results and models trained with Group Normalization:

@article{GroupNorm2018,
  title={Group Normalization},
  author={Yuxin Wu and Kaiming He},
  journal={arXiv:1803.08494},
  year={2018}
}

Note: This code uses the GroupNorm op implemented in CUDA, included in the Caffe2 repo. When writing this document, Caffe2 is being merged into PyTorch, and the GroupNorm op is located here. Make sure your Caffe2 is up to date.

Pretrained Models with GN

These models are trained in Caffe2 on the standard ImageNet-1k dataset, using GroupNorm with 32 groups (G=32).

R-50-GN.pkl: ResNet-50 with GN, 24.0% top-1 error (center-crop).
R-101-GN.pkl: ResNet-101 with GN, 22.6% top-1 error (center-crop).

Results

Baselines with BN

^_case	^_type	^_lr schd	^_im/ gpu	^{_{train mem (GB)}}	^{_{train time (s/iter)}}	^{_{train time total (hr)}}	^{_{inference time (s/im)}}	^_box AP	^_mask AP	^{_{model id}}
^{_{R-50-FPN, BN*}}	^{_{Mask R-CNN}}	^_2x	^₂	^_8.6	^_0.897	^_44.9	^{_{0.099 + 0.018}}	^_38.6	^_34.5	^_35859007
^{_{R-101-FPN, BN*}}	^{_{Mask R-CNN}}	^_2x	^₂	^_10.2	^_0.993	^_49.7	^{_{0.126 + 0.017}}	^_40.9	^_36.4	^_35861858

Notes:

This table is copied from Detectron Model Zoo.
BN^* means that BatchNorm (BN) is used for pre-training and is frozen and turned into a per-channel linear layer when fine-tuning. This is the default of Faster/Mask R-CNN and Detectron.

Mask R-CNN with GN

Standard Mask R-CNN recipe

^_case	^_type	^_lr schd	^_im/ gpu	^{_{train mem (GB)}}	^{_{train time (s/iter)}}	^{_{train time total (hr)}}	^{_{inference time (s/im)}}	^_box AP	^_mask AP	^{_{model id}}	^{_{download links}}
^{_{R-50-FPN, GN}}	^{_{Mask R-CNN}}	^_2x	^₂	^_10.5	^_1.017	^_50.8	^{_{0.146 + 0.017}}	^_40.3	^_35.7	^_48616381	^{_{model \| boxes \| masks}}
^{_{R-101-FPN, GN}}	^{_{Mask R-CNN}}	^_2x	^₂	^_12.4	^_1.151	^_57.5	^{_{0.180 + 0.015}}	^_41.8	^_36.8	^_48616724	^{_{model \| boxes \| masks}}

Notes:

GN is applied on: (i) ResNet layers inherited from pre-training, (ii) the FPN-specific layers, (iii) the RoI bbox head, and (iv) the RoI mask head.
These GN models use a 4conv+1fc RoI box head. The BN^* counterpart with this head performs similarly with the default 2fc head: using this codebase, R-50-FPN BN^* with 4conv+1fc has 38.8/34.4 box/mask AP.
2x is the default schedule (180k) in Detectron.

Longer training schedule

^_case	^_type	^_lr schd	^_im/ gpu	^{_{train mem (GB)}}	^{_{train time (s/iter)}}	^{_{train time total (hr)}}	^{_{inference time (s/im)}}	^_box AP	^_mask AP	^{_{model id}}	^{_{download links}}
^{_{R-50-FPN, GN}}	^{_{Mask R-CNN}}	^_3x	^₂	^_10.5	^_1.033	^_77.4	^{_{0.145 + 0.015}}	^_40.8	^_36.1	^_48734751	^{_{model \| boxes \| masks}}
^{_{R-101-FPN, GN}}	^{_{Mask R-CNN}}	^_3x	^₂	^_12.4	^_1.171	^_87.9	^{_{0.180 + 0.014}}	^_42.3	^_37.2	^_48734779	^{_{model \| boxes \| masks}}

Notes:

3x is a longer schedule (270k). GN can improve further when using the longer schedule, but its BN^* counterpart remains similar (R-50-FPN BN^*: 38.9/34.3) with the longer schedule.
These models are without any scale augmentation that can further improve results.

Explorations

Training Mask R-CNN from scratch

GN enables to train Mask R-CNN from scratch without ImageNet pre-training, despite the small batch size.

^_case	^_type	^_lr schd	^_im/ gpu	^{_{train mem (GB)}}	^{_{train time (s/iter)}}	^{_{train time total (hr)}}	^{_{inference time (s/im)}}	^_box AP	^_mask AP	^{_{model id}}
^{_{R-50-FPN, GN, scratch}}	^{_{Mask R-CNN}}	^_3x	^₂	^_10.8	^_1.087	^_81.5	^{_{0.140 + 0.019}}	^_39.5	^_35.2	^_56421872
^{_{R-101-FPN, GN, scratch}}	^{_{Mask R-CNN}}	^_3x	^₂	^_12.7	^_1.243	^_93.2	^{_{0.177 + 0.019}}	^_41.0	^_36.4	^_56421911

Notes:

To reproduce these results, see the config yaml files starting with scratch .
These are results using freeze_at=0. See this commit about the related issue.

^{_{R-50-FPN, GN, scratch}}	^{_{Mask R-CNN}}	^_3x	^₂	^_10.5	^_0.990	^_74.3	^{_{0.146 + 0.020}}	^_36.2	^_32.5	^_49025460
^{_{R-101-FPN, GN, scratch}}	^{_{Mask R-CNN}}	^_3x	^₂	^_12.4	^_1.124	^_84.3	^{_{0.180 + 0.019}}	^_37.5	^_33.3	^_49024951

Notes:

These are early results that followed the default training using freeze_at=2. This means the layers of conv1 and res2 were simply random weights in the case of training from-scratch. See this commit about the related issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GN

GN

README.md

Group Normalization for Mask R-CNN

Introduction

Pretrained Models with GN

Results

Baselines with BN

Mask R-CNN with GN

Standard Mask R-CNN recipe

Longer training schedule

Explorations

Training Mask R-CNN from scratch

Files

GN

Directory actions

More options

Directory actions

More options

Latest commit

History

GN

Folders and files

parent directory

README.md

Group Normalization for Mask R-CNN

Introduction

Pretrained Models with GN

Results

Baselines with BN

Mask R-CNN with GN

Standard Mask R-CNN recipe

Longer training schedule

Explorations

Training Mask R-CNN from scratch