Skip to content

google/haloquest

HaloQuest

Welcome to the repository of our ECCV 2024 paper, HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning. This repository contains a colab that shows how to load the HaloQuest data and how to use the Auto-Eval system.

Code to reproduce the experiments in the paper is available here.

Unofficial Project Page

image

Updates

  • 2024.07.22: Our paper is on arxiv

Dataset Description

Summary

HaloQuest is a novel visual question answering (VQA) dataset that focuses on multimodal hallucination in vision-language models (VLMs). It contains 7,748 examples with a combination of real and synthetically generated images, annotated with questions and answers designed to trigger and evaluate hallucinations.

Supported Tasks

HaloQuest supports tasks related to hallucination detection and reduction in VLMs, providing a challenging benchmark for Visual Question Answering. The dataset is useful for both evaluation and fine-tuning purposes, aiming to advance multimodal reasoning.

Dataset Details

Data Collection

HaloQuest includes a mix of real images from the Open Images dataset and synthetic images generated using Midjourney. Images were curated based on interest and comprehensibility. Questions and answers were crafted by humans and large language models (LLMs), focusing on false premises, visually challenging questions, and questions with insufficient context.

image

Data Splits

The dataset is split into training and evaluation sets. The following table provides detailed statistics for each subset.

Real Images Synthetic Images False Premise Questions Visually Challenging Questions Insufficient Context Questions Total Entries
Training Set 2985 4155 2698 2973 1469 7140
Evaluation Set 217 391 304 183 121 608
Total 3202 4546 3002 3156 1590 7748

Leaderboard

(Gemini 1.0 Pro was used for Auto-Eval)

Model (#Param) Rank Overall Generated Real False Premise Visually Challenging Insufficient Context
Human Eval Auto-Eval Human Eval Auto-Eval Human Eval Auto-Eval Human Eval Auto-Eval Human Eval Auto-Eval Human Eval Auto-Eval
Gemini 1.5 Pro (May 2024)176.177.974.778.378.777.280.483.757.356.39192.5
GPT-4o (May 2024)268.163.268.863.866.962.268.565.258.355.280.668.7
GPT-4 (May 2024)362.961.264.361.160.661.464.76346.944.880.679.1
BEiT-3 (0.7B) (Mar 2024)435.94041.244.326.332.324.128.436.636.19.110.7
InstructBLIP (12B) (Mar 2024)525.528.528.431.520.32328.43233.333.96.611.6
InstructBLIP (8B) (Mar 2024)62527.328.429.718.92328.4326.611.633.333.9
BLIP2 (12B) (Mar 2024)721.122.524.826.114.2916.116.819.535.532.89.914.9
MiniGPT4 (13B) (Mar 2024)818.725.218.22418.927.216.221.510.413.736.451.2
MiniGPT4 (7B) (Mar 2024)918.619.118.119.41818.413.213.226.527.315.716.5
Open-flamingo (9B) (Mar 2024)1013.81516.117.19.711.113.213.919.121.37.48.3
LLaVA (13B) (Mar 2024)1110.910.912.312.88.27.42.31.730.631.22.53.3
BLIP2 (8B) (Mar 2024)1210.911.811.511.89.71254.626.826.81.76.6
mPLUG-Owl1 (7B) (Mar 2024)139.78.711.310.26.9610.32926.82.52.5
mPLUG-Owl2 (7B) (Mar 2024)149.210.41111.368.80.83.328.427.90.83.3
OFA (1B) (Mar 2024)158.710.29.711.36.98.356.319.720.21.75
Open-flamingo (3B) (Mar 2024)166.98.27.48.767.40.71.319.121.34.15.8

Contributions

Zhecan Wang*, Garrett Bingham*, Adams Wei Yu, Quoc V. Le, Thang Luong, Golnaz Ghiasi

(* ZW and GB are main contributors. ZW did some work while at Google DeepMind.)

Citing this work

@inproceedings{wang2024haloquest,
  title={HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning},
  author={Zhecan Wang and Garrett Bingham and Adams Wei Yu and Quoc V. Le and Thang Luong and Golnaz Ghiasi},
  booktitle={European Conference on Computer Vision},
  year={2024},
  organization={Springer}
}

License and disclaimer

Copyright 2024 DeepMind Technologies Limited

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Image URLs are from the Open Images Dataset v7 and Midjourney Showcase. Images may be individually licensed and you should verify the license for each image yourself.

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •