Official implementation of the paper ``Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization" (ECCV'22)
WSOL aims at training a feature extractor and a classifier based on the CE between image-level features and image-level annotation. This classifier is then directly used as the localizer onto pixel-level features in the test time to generate pixel-level classification results, i.e., the localization map.
However, the object localizer focuses on discerning the class of all regional positions based on the pixel-level features, where discriminative factors may not be well-aggregated, i.e., insufficient to activate the globally-learned classifier.
To bridge this gap, our work proposes a plug-and-play approach called BagCAMs, which can better project an image-level trained classifier to comply with the requirement of localization tasks.
Our BagCAMs focuses on deriving a set of regional localizers from this well-trained classifier. Those regional localizers can discern object-related factors with respect to each spatial position, acting as the base learners of the ensemble learning. With those regional localizers, the final localization results can be obtained by integrating their effect.
Due to better uitlize multiple regional localizer, our BagCAMs can perform better than existing CAM-based method, especially for the intermediate that has higher spatial resolution.
Following DA-WSOL to prepare the dataset
Following DA-WSOL to train the baseline method (CAM/HAS/CutMix/ADL/DA-WSOL)
Note that --post_methods" should be set as
CAM" for efficiency in the training process.
-
Confirming ``$data_root" is set as the folder of datasets that has been arranged as mentioned above.
-
Downloading the checkpoint of DA-WSOL from our google drive. (or using the checkpoint outputed by the training step)
-
Setting ``--check_path" as the path of the checkpoint generated by training process or our released checkpoint.
-
Confirming
--architecture" and
--wsol_method" are consist with the setting for the trained checkpoint. -
Set ``--post_methods" as BagCAMs (or other methods, e.g., CAM/GradCAM/GradCAM++/PCS)
-
Set ``--target_layer" as name of the layer whose outputed feature & gradient are used. (e.g., layer1,2,3,4 for ResNet backbone).
-
Running ``bash run_test.sh"
-
Test log files and test scores are save in "--save_dir"
Top-1 Loc | GT-known Loc | MaxBoxAccV2 | |
---|---|---|---|
DA-WSOL-ResNet-CAM | 43.26 | 70.27 | 68.23 |
DA-WSOL-ResNet-BagCAMs | 44.24 | 72.08 | 69.97 |
DA-WSOL-InceptionV3 | 52.70 | 69.11 | 64.75 |
DA-WSOL-InceptionV3-BagCAMs | 53.87 | 71.02 | 66.93 |
Top-1 Loc | GT-known Loc | MaxBoxAccV2 | pIoU | PxAP | |
---|---|---|---|---|---|
DA-WSOL-ResNet-CAM | 62.40 | 81.83 | 69.87 | 56.18 | 74.70 |
DA-WSOL-ResNet-BagCAMs | 69.67 | 94.01 | 84.88 | 74.51 | 90.38 |
DA-WSOL-InceptionV3-CAM | 56.29 | 80.03 | 68.01 | 51.81 | 71.03 |
DA-WSOL-InceptionV3-BagCAMs | 60.07 | 89.78 | 76.94 | 58.05 | 72.97 |
pIoU | PxAP | |
---|---|---|
DA-WSOL-ResNet-CAM | 49.68 | 65.42 |
DA-WSOL-ResNet-BagCAMs | 52.17 | 67.68 |
DA-WSOL-InceptionV3-CAM | 48.01 | 64.46 |
DA-WSOL-InceptionV3-BagCAMs | 50.79 | 66.89 |
@article{BagCAMs,
title={Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization},
author={Zhu, Lei and Chen, Qian and Jin, Lujia and You, Yunfei and Lu, Yanye},
journal={arXiv preprint arXiv:2207.07818},
year={2022}
}
@article{DAWSOL,
title={Weakly Supervised Object Localization as Domain Adaption},
author={Zhu, Lei and She, Qi and Chen, Qian and You, Yunfei and Wang, Boyu and Lu, Yanye},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14637--14646},
year={2022}
}
This code and our experiments are conducted based on the release code of gradcam / wsolevaluation / transferlearning. Here we thank for their remarkable works.