Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective
Abstract
Large vision language models have demonstrated remarkable efficacy in addressing challenges related to both textual and visual content. Nevertheless, these models are susceptible to various hallucinations. In this paper, we focus on a new form of hallucination, specifically termed as number <PRE_TAG>hallucination</POST_TAG>, which denotes instances where models fail to accurately identify the quantity of objects in an image. We establish a dataset and employ evaluation metrics to assess number <PRE_TAG>hallucination</POST_TAG>, revealing a pronounced prevalence of this issue across mainstream large vision language models (LVLMs). Additionally, we delve into a thorough analysis of number <PRE_TAG>hallucination</POST_TAG>, examining inner and outer inconsistency problem from two related perspectives. We assert that this inconsistency is one cause of number <PRE_TAG>hallucination</POST_TAG> and propose a consistency training method as a means to alleviate such hallucination, which achieves an average improvement of 8\% compared with direct finetuning method.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper