LettuceDetect: A Hallucination Detection Framework for RAG Applications
Abstract
Retrieval Augmented Generation (RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. We present LettuceDetect a framework that addresses two critical limitations in existing hallucination detection methods: (1) the context window constraints of traditional encoder-based methods, and (2) the computational inefficiency of LLM based approaches. Building on ModernBERT's extended context capabilities (up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach outperforms all previous encoder-based models and most prompt-based models, while being approximately 30 times smaller than the best models. LettuceDetect is a token-classification model that processes context-question-answer triples, allowing for the identification of unsupported claims at the token level. Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for example-level detection, which is a 14.8% improvement over Luna, the previous state-of-the-art encoder-based architecture. Additionally, the system can process 30 to 60 examples per second on a single GPU, making it more practical for real-world RAG applications.
Community
We released ππ²ππππ°π²ππ²ππ²π°π, a lightweight hallucination detection framework for Retrieval-Augmented Generation (RAG) pipelines.
LettuceDetect addresses two critical challenges:
The π°πΌπ»ππ²π
π-ππΆπ»π±πΌπ πΉπΆπΊπΆππ in prior encoder-only models.
The π΅πΆπ΄π΅ π°πΌπΊπ½πππ² π°πΌπππ associated with LLM-based detectors.
Built on π πΌπ±π²πΏπ»πππ₯π§, our encoder-based model is released under the π ππ§ license and comes with ready-to-use Python packages and pretrained models.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models (2025)
- FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA (2025)
- MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models (2025)
- How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild (2025)
- SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models (2025)
- Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation (2025)
- Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper