Skip to content

min99830/vlis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLIS: Unimodal Language Models Guide Multimodal Language Generation

This repository contains the code for our EMNLP 2023 paper: VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung, Youngjae Yu

image

Paper & Presentation slides (tbu)

VLIS per backbone models

VLIS is a decoding-time method agnostic of the backbone model. We support inference UI code for three difference recent vision-language backbones: BLIP-2, LLAVA, and Lynx

Preparation

For python dependencies, we provide two options:

  • A monolithic conda environment for all models (conda env create -f env.yaml).
  • Per-model requirements.txt file.

Other than that, please look into per-model instructions (e.g. code/blip2/PREPARATION.md) for preparation.

Usage

Use the provided UI script to test VLIS per backbone.

For example, when using VLIS with LLaVA:

code code/llava
python ui.py --lm_name 'lmsys/vicuna-7b-v1.5'

The Gradio UI supports widely used text generation hyperparameters, including temperature, num_beams, top_p, max_length.

It would be straightforward to convert the given UI code into an offline script for data evaluation. If you want inference script for a particular dataset, please feel free to open an issue.

Landmark and Character benchmarks

We assess visual specificity of visual-language models on named entity in appendix A of our paper. Here, we provide the data and evaluation code to replicate our experiments.

Preparation

Use code/data/character_urls.json and code/data/landmark_urls.json to download the images, respectively.

Evaluation

The model generated output files should be structured as follows:

OUTPUT_FILE

{
  DATA_ID: {
    MODELNAME: OUTPUT
  }
}

We provide example output files in code/data/example_landmark.json. Then, run the evaluation script to get the statistics.

python code/data/landmark_eval.py --path OUTPUT_FILE

Contact

Jiwan Chung: [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.8%
  • JavaScript 1.6%
  • HTML 1.2%
  • Shell 1.1%
  • CSS 0.3%