This is the official code of ICCV 2021 paper:
Residual Attention: A Simple But Effective Method for Multi-Label Recoginition
This package is developed by Mr. Ke Zhu (http://www.lamda.nju.edu.cn/zhuk/) and we are trying to add the implementation code of ViT models. If you have any question about the code, please feel free to contact Mr. Ke Zhu ([email protected]). The package is free for academic usage. You can run it at your own risk. For other purposes, please contact Prof. Jianxin Wu (mail to [email protected]).
- Python 3.7
- pytorch 1.6
- torchvision 0.7.0
- pycocotools 2.0
- tqdm 4.49.0, pillow 7.2.0
We expect VOC2007 and COCO2014 dataset to have the following structure:
Dataset/
|-- VOCdevkit/
|---- VOC2007/
|------ JPEGImages/
|------ Annotations/
|------ ImageSets/
......
|-- COCO2014/
|---- annotations/
|---- images/
|------ train2014/
|------ val2014/
...
Then directly run the following command to generate json file (for implementation) of these datasets.
python utils/prepare_voc.py --data_path Dataset/VOCdevkit
python utils/prepare_coco.py --data_path Dataset/COCO2014
which will automatically result in json files in ./data/voc07 and ./data/coco
We provide pretrained models on Google Drive for validation. ResNet101 trained on ImageNet with CutMix augmentation can be downloaded here.
Dataset | Backbone | Head nums | mAP | Resolution | Download |
---|---|---|---|---|---|
VOC2007 | ResNet-101 | 1 | 94.7 | 448x448 | download |
VOC2007 | ResNet-cut | 1 | 95.2 | 448x448 | download |
COCO | ResNet-101 | 4 | 83.3 | 448x448 | download |
COCO | ResNet-cut | 6 | 85.6 | 448x448 | download |
After model preparation, you can run the following validation command:
CUDA_VISIBLE_DEVICES=0 python val.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20 --load_from PRETRAINED_MODEL.pth
You can run either of these two lines below
CUDA_VISIBLE_DEVICES=0 python main.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20
CUDA_VISIBLE_DEVICES=0 python main.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20 --cutmix CutMix_ResNet101.pth
Note that the first command uses the Official ResNet-101 backbone while the second command uses the ResNet-101 pretrained on ImageNet with CutMix augmentation link (which is supposed to gain better performance).
run the ResNet-101 with 4 heads
CUDA_VISIBLE_DEVICES=0 python main.py --num_heads 6 --lam 0.5 --dataset coco --num_cls 80
run the ResNet-101 (pretrained with CutMix) with 6 heads
CUDA_VISIBLE_DEVICES=0 python main.py --num_heads 6 --lam 0.4 --dataset coco --num_cls 80 --cutmix CutMix_ResNet101.pth
You can feel free to adjust the hyper-parameters such as number of attention heads (--num_heads), or the Lambda (--lam). Still, the default values of them in the above command are supposed to be the best.
To avoid confusion, please note the 4 lines of code in Figure 1 (in paper) is only used in test stage (without training), which is our motivation. When our model is end-to-end training and testing, multi-head-attention (H=1, H=2, H=4, etc.) is used with different T values. Also, when H=1 and T=infty, the implementation code of multi-head-attention is exactly the same with Figure 1.
We thank Lin Sui (http://www.lamda.nju.edu.cn/suil/) for his initial contribution to this project.