Deep Semantic-aware Proxy Hashing for Multi-label Cross-modal Retrieval Paper
This paper is accepted for IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). If you have any questions please contact [email protected].
We use python to build our code, you need to install those package to run
- pytorch 1.12.1
- sklearn
- tqdm
- pillow
Before training, you need to download the oringal data from coco(include 2017 train,val and annotations), nuswide Google drive, mirflickr25k Baidu, 提取码:u9e1 or Google drive (include mirflickr25k and mirflickr25k_annotations_v080), then use the "data/" to generate .mat file
After all mat file generated, the dir of dataset
will like this:
├── coco
│ ├── caption.mat
│ ├── index.mat
│ └── label.mat
├── flickr25k
│ ├── caption.mat
│ ├── index.mat
│ └── label.mat
└── nuswide
├── caption.txt # Notice! It is a txt file!
├── index.mat
└── label.mat
Pretrained model will be found in the 30 lines of CLIP/clip/ This code is based on the "ViT-B/32".
You should copy to this dir.
After the dataset has been prepared, we could run the follow command to train.
python --is-train --dataset coco --caption-file caption.mat --index-file index.mat --label-file label.mat --lr 0.001 --output-dim 64 --save-dir ./result/coco/64 --clip-path ./ --batch-size 128 --numclass 80
author={Huo, Yadong and Qin, Qibing and Dai, Jiangyan and Wang, Lei and Zhang, Wenfeng and Huang, Lei and Wang, Chengduan},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal Retrieval},