Skip to content

Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024

Notifications You must be signed in to change notification settings

kuan2jiu99/audio-hallucination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Understanding and Tackling Hallucinations in Large Audio-Language Models

Overview

This repository includes the following research papers on audio hallucination:

  • Conference: ICASSP 2025
  • Keywords: Object Existence, Temporal Order, Object Attribute, Multi-turn And Thoughtful Chain of Hearings (MATCH)
  • GitHub Page | arXiv
  • Conference: Interspeech 2024
  • Keywords: Object Hallucination, LALMs
  • GitHub Page | arXiv

Citation

If you find our work useful, please kindly cite the following papers:

The first paper aims to provide a more comprehensive analysis of audio hallucination, covering aspects such as object existence, temporal order, and object attributes. It also proposes simple and effective methods to improve the performance of these models.

The second paper is the first to systematically analyze and explore object hallucination phenomena in large audio-language models.

@article{kuan2024can,
  title={Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning},
  author={Kuan, Chun-Yi and Lee, Hung-yi},
  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2025},
  arxiv = {2410.16130}
}

@article{kuan2024understanding,
  title={Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models},
  author={Kuan, Chun-Yi and Huang, Wei-Ping and Lee, Hung-yi},
  booktitle={2024 Conference of the International Speech Communication Association (INTERSPEECH)},
  year={2024},
  arxiv = {2406.08402},
}

About

Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages