BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
This is the PyTorch code of the BLIP paper [blog]. The code has been tested on PyTorch 1.10. To install the dependencies, run
pip install -r requirements.txt
Num. pre-train images | BLIP w/ ViT-B and CapFilt-L |
---|---|
129M | Download |
If you find this code to be useful for your research, please consider citing.
@inproceedings{li2022blip, title={BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation}, author={Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi}, year={2022}, booktitle={ICML}, }
The implementation of BLIP relies on resources from ALBEF, Huggingface Transformers, and timm. We thank the original authors for their open-sourcing.