Skip to content

Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge"

Notifications You must be signed in to change notification settings

NeverMoreLCH/SearchLVLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SearchLVLMs

SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge (NeurIPS 2024)
Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang
[Homepage] [Paper]

Example Image


News

  • 2024.12.09: 🎉 The inference code and UDK-VQA dataset are released!
  • 2024.09.26: 🎉 SearchLVLMs is accepted by NeurIPS 2024!

Install

conda env create -f environment.yml
conda activate searchlvlms

Prerequisites

Llama3

Install Llama3 and download the checkpoint.

VLMEvalKit

Install VLMEvalKit and download the checkpoints of LVLMs for testing.

LLaVA-1.5

Install LLaVA-1.5 and download the pretrained model and the projector weights.

NER

Download the NER model via huggingface.

CLIP

Download the CLIP model via huggingface.


UDK-VQA Dataset and Checkpoint

For both UDK-VQA and the checkpoint of our filter, download them from: [OneDrive] or [Baidu NetDisk (password: DSPS)]

Unzip the zip files and make sure the file structure looks like this:

SearchLVLMs
----checkpoints
--------llava_lora_content_filter
--------llava_lora_website_filter
----datasets
--------test
--------train
...

Configurations

Configure the variables in scripts/init_env_variable.sh, scripts/iag.sh and scripts/eval.sh.

  • scripts/init_env_variable.sh
# used for generating queries for the question via llama3.
llama3_dir="<your path of llama3 project>"

# used for running eval.sh
vlmevalkit_dir="<your path of vlmevalkit project>"

# used for calling gpt to generate queries for the question.
OPENAI_API_KEY=""
OPENAI_ENDPOINT=""

# the keys of the google search engine are optional, as we mainly use the bing search engine.
google_api_key=""
google_text_cse_id=""
google_image_cse_id=""

# img_api is optional, as it's used for generating samples, which is not released yet.
bing_text_api_key=""
bing_img_api_key=""
bing_visual_api_key=""

For the variables in scripts/iag.sh and scripts/eval.sh, you can easily understand them via their names.


Evaluation

You can run the following scripts to evaluate LVLMs (or LVLMs+SearchLVLMs)

cd SearchLVLMs

# Active environment variable
source scripts/init_env_variable.sh

# Run SearchLVLMs to find the best context for each sample in the test set.
sh scripts/iag.sh

# Eval the accuracy of LVLMs (or LVLMs+SearchLVLMs) on the test set.
sh scripts/eval.sh

# Deactivate environment variable
source scripts/unset_env_variable.sh

Citation

If any part of our paper and code is helpful to your work, please generously cite with:

@inproceedings{li2024searchlvlms,
  title={SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge},
  author={Li, Chuanhao and Li, Zhen and Jing, Chenchen and Liu, Shuo and Shao, Wenqi and Wu, Yuwei and Luo, Ping and Qiao, Yu and Zhang, Kaipeng},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024}
}

About

Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published