jingzhengli / Tag2Text Public

forked from xinyu1205/recognize-anything

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Code for paper: Tag2Text: Guiding Vision-Language Model via Image Tagging

tag2text.github.io

MIT license

0 stars 291 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
framework.png		framework.png

Repository files navigation

Tag2Text: Guiding Vision-Language Model via Image Tagging

Official PyTorch Implementation of the Tag2Text paper. We will fully open source the code and data in the future.

Welcome to try out Tag2Text Web demo🤗! Both Tagging and Captioning are included.

Abstract

Tag2Text achieves a superior image tag recognition ability by exploiting fine-grained text information. By leveraging tagging guidance, Tag2Text effectively enhances the performance of vision-language models on both generation-based and alignment-based tasks.

Citation

If you find our work to be useful for your research, please consider citing.

@article{huang2023tag2text,
  title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
  author={Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
  journal={arXiv preprint arXiv:2303.05657},
  year={2023}
}