Here, we take MMSegmentation v0.13.0 as an example, applying PVT to SemanticFPN.
For details see Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
If you use this code for a paper please cite:
PVTv1
@misc{wang2021pyramid,
title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
year={2021},
eprint={2102.12122},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
PVTv2
@misc{wang2021pvtv2,
title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
year={2021},
eprint={2106.13797},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Install MMSegmentation.
First, prepare ADE20K according to the guidelines in MMSegmentation.
Then, download the weights pretrained on ImageNet, and put them in a folder pretrained/
Method | Backbone | Pretrain | Iters | mIoU(code) | mIoU(paper) | Config | Download |
---|---|---|---|---|---|---|---|
Semantic FPN | PVT-Tiny | ImageNet-1K | 40K | 36.6 | 35.7 | config | log & model |
Semantic FPN | PVT-Small | ImageNet-1K | 40K | 41.9 | 39.8 | config | log & model |
Semantic FPN | PVT-Medium | ImageNet-1K | 40K | 43.5 | 41.6 | config | log & model |
Semantic FPN | PVT-Large | ImageNet-1K | 40K | 43.5 | 42.1 | config | log & model |
To evaluate PVT-Small + Semantic FPN on a single node with 8 gpus run:
dist_test.sh configs/sem_fpn/PVT/fpn_pvt_s_ade20k_40k.py /path/to/checkpoint_file 8 --out results.pkl --eval mIoU
To train PVT-Small + Semantic FPN on a single node with 8 gpus run:
dist_train.sh configs/sem_fpn/PVT/fpn_pvt_s_ade20k_40k.py 8
This repository is released under the Apache 2.0 license as found in the LICENSE file.