Official PyTorch Implementation of the Tag2Text paper. We will fully open source the code and data in the future.
Welcome to try out Tag2Text Web demo🤗! Both Tagging and Captioning are included.
Tag2Text achieves a superior image tag recognition ability by exploiting fine-grained text information. By leveraging tagging guidance, Tag2Text effectively enhances the performance of vision-language models on both generation-based and alignment-based tasks.
![]() |
If you find our work to be useful for your research, please consider citing.
@article{huang2023tag2text,
title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
author={Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
journal={arXiv preprint arXiv:2303.05657},
year={2023}
}