Papers
arxiv:2403.01373

Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective

Published on Mar 3, 2024
Authors:
,
,

Abstract

Large vision language models have demonstrated remarkable efficacy in addressing challenges related to both textual and visual content. Nevertheless, these models are susceptible to various hallucinations. In this paper, we focus on a new form of hallucination, specifically termed as number <PRE_TAG>hallucination</POST_TAG>, which denotes instances where models fail to accurately identify the quantity of objects in an image. We establish a dataset and employ evaluation metrics to assess number <PRE_TAG>hallucination</POST_TAG>, revealing a pronounced prevalence of this issue across mainstream large vision language models (LVLMs). Additionally, we delve into a thorough analysis of number <PRE_TAG>hallucination</POST_TAG>, examining inner and outer inconsistency problem from two related perspectives. We assert that this inconsistency is one cause of number <PRE_TAG>hallucination</POST_TAG> and propose a consistency training method as a means to alleviate such hallucination, which achieves an average improvement of 8\% compared with direct finetuning method.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.01373 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.01373 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.01373 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.