Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Overview

Multimodal large language models (MLLMs) offer a powerful mechanism for interpreting visual information. However, they often suffer from hallucinations, which impede the real-world usage of these models. Existing methods attempt to alleviate this issue by designing special decoding strategies that penalize the summary tokens. However, these methods lack analysis of the relationship between hallucination and the summarization mechanism of LLMs. Interestingly, we find that penalizing summary tokens is not necessary: merely intervening the query-key parameters variance, without costing extra inference time, still alleviates hallucinations. Specifically, we explore the causes of hallucinations by analyzing localized self-attention patterns called “anchor” tokens and define the attention localization degree of the model as token propagation probabilities. Our analysis reveals that over-propagation of anchor tokens occurs when the distribution of eigenvalues of the query and key matrices has a non-zero mean and a polarized variance, leading to excessive dependence on anchor tokens while neglecting vision information and describing the image content with hallucination. Based on this observation, we propose a versatile plug-and-play decoding strategy, Dynamic Token Propagation Mechanism (TAME), to alleviate excessive propagation by dynamically intervening the eigenspectrum variance of the attention weight, thereby alleviating hallucinations without relying on complex decoding strategies. Extensive experiments reveal a correlation between the eigenspectrum and hallucinations across various MLLMs and show that TAME reduces the percentage of hallucinated objects.

Setup

As we design the LVLMs decoding strategy, it is convenient to use ANTRP by installing our modified transformers package.

conda env create -f environment.yml
conda activate ANTRP
python -m pip install -e transformers

Implementation

After setup the environment, you can directly use our code base to imply our ANTRP:

python pope_eval.py --pope-type coco_adversarial --model llava-1.5  --beam 5  --opera #OPERA

python pope_eval.py --pope-type coco_adversarial --model llava-1.5  --use-cd  --use-fast-v  --sample  --sample-greedy  #SID_greedy

python pope_eval.py --pope-type coco_adversarial --model llava-1.5  --use-vcd  --sample  --sample-greedy  #VCD_greedy

python pope_eval.py --pope-type coco_adversarial --model llava-1.5  --use-icd  --sample  --sample-greedy  #ICD_greedy

python pope_eval.py --pope-type coco_adversarial --model llava-1.5  --beam 5  #Beam Search

The CHAIR metric utilizes the same configuration.

Evaluation

We provide extensive evaluation metrics including GPT-4V eval_utils/gpt4v_eval.py , GPT4 shr_eval.py, POPE pope_eval.py, CHAIR eval_utils/chair_eval.py

The following evaluation requires for MSCOCO 2014 / Visual Genome dataset. For Visual Genome dataset, please download here dataset/download_visual_genome_v1.2.py and extract it in the data path.

Besides, it needs you to prepare the following checkpoints of 7B base models:

Download LLaVA-1.5 merged 7B model and specify it at eval_configs/llava-1.5_eval.yaml.
Download Vicuna 7B v1.1 model and specify it at minigpt4/configs/models/blip2_instruct_vicuna7b.yaml.
Download Shikra merged 7B model and specify it at eval_configs/shikra_eval.yaml.
Download MiniGPT-4 7B pretrained weights and specify it at Line 8 of eval_configs/minigpt4_eval.yaml.

Arguments

Argument	Example	Description
`--model`	`llava-1.5`	Specify the LVLM model.
`--data-path`	`/path/to/dataset`	Path to the dataset file or folder.
`--pope-type`	`coco_adversarial`	Type for POPE evaluation.
`--sample`	`store_true`	Use the modified decoding strategy.
`--sample-greedy`	`store_true`	Use CD with sampling and greedy decoding.
`--beam`	`5`	Beam search number.
`--opera`	`store_true`	Use OPERA.

Acknowledgement

This repo is based on the LVLMs codebase of SID, OPERA, VCD, and HA-DPO . Thanks for their excellent works!

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.idea		.idea
METHOD_EXPERIMENTS		METHOD_EXPERIMENTS
dataset		dataset
eval_configs		eval_configs
eval_utils		eval_utils
imgs		imgs
minigpt4		minigpt4
pope		pope
transformers		transformers
README.md		README.md
chair.pkl		chair.pkl
chair.py		chair.py
environment.yml		environment.yml
pope_loader.py		pope_loader.py
requirements.txt		requirements.txt
select_sample.py		select_sample.py
vcd_add_noise.py		vcd_add_noise.py
vcd_sample.py		vcd_sample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Overview

Setup

Implementation

Evaluation

Arguments

Acknowledgement

About

Releases

Packages

Languages

MeowsCow/ANTRP

Folders and files

Latest commit

History

Repository files navigation

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Overview

Setup

Implementation

Evaluation

Arguments

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages