This is a Pytorch implementation of PiNI:
Sida Huang, Hongyuan Zhang*, and Xuelong Li*, "Enhance Vision-Language Alignment with Noise", in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025. (arXiv)
# Install torch (requires version >= 1.8.1) and torchvision
# Please refer to https://pytorch.org/ if you need a different cuda version
conda install pytorch torchvision torchaudio pytorch-cuda=12.1
# install Dassl
cd Dassl.pytorch/
# Install dependencies in Dassl
pip install -r requirements.txt
# Install this library (no need to re-build if the source code is modified)
python setup.py develop
# install other dependencies
cd ..
pip install -r requirements.txt
Follow DATASETS.md to install the datasets.
-
Modify the paths of the data and the models.
# scripts/clip_vpn.sh DATA=/your/path/to/$DATA MODEL=/your/path/to/pretrained/clip/models
-
Run the code
bash scripts/clip_vpn.sh DATASET CFG SHOTS
parameters:
DATASET: dataset name in configs/datasets/, such as imagenet
CFG: config file name in configs/trainsers/TRAINER/, such as imagenet_config
SHOTS: the number of shots in 1, 2, 4, 8, 16
This repo benefits from CLIP, CoOp. Thanks for their excellent works.
If you have any question about this project, please contact [email protected] and [email protected].
If you find the code useful for your research, please consider citing our work:
@inproceedings{PiNI,
author={Huang, Sida and Zhang, Hongyuan and Li, Xuelong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
title={Enhance Vision-Language Alignment with Noise},
year={2025},
pages={},
}