@inproceedings{lu2019grid,
title={Grid r-cnn},
author={Lu, Xin and Li, Buyu and Yue, Yuxin and Li, Quanquan and Yan, Junjie},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2019}
}
@article{lu2019grid,
title={Grid R-CNN Plus: Faster and Better},
author={Lu, Xin and Li, Buyu and Yue, Yuxin and Li, Quanquan and Yan, Junjie},
journal={arXiv preprint arXiv:1906.05688},
year={2019}
}
Backbone | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
---|---|---|---|---|---|---|
R-50 | 2x | 4.8 | 1.172 | 10.9 | 40.3 | model |
R-101 | 2x | 6.7 | 1.214 | 10.0 | 41.7 | model |
X-101-32x4d | 2x | 8.0 | 1.335 | 8.5 | 43.0 | model |
X-101-64x4d | 2x | 10.9 | 1.753 | 6.4 | 43.1 | model |
Notes:
- All models are trained with 8 GPUs instead of 32 GPUs in the original paper.
- The warming up lasts for 1 epoch and
2x
here indicates 25 epochs.