---
license: mit
language:
- en
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: token-classification
tags:
- token classification
- hallucination detection
- transformers
---
# LettuceDetect: Hallucination Detection Model
**Model Name:** lettucedect-large-modernbert-en-v1
**Organization:** KRLabsOrg
**Github:** https://github.com/KRLabsOrg/LettuceDetect
## Overview
LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for Retrieval-Augmented Generation (RAG) applications. This model is built on **ModernBERT**, which has been specifically chosen and trained becasue of its extended context support (up to **8192 tokens**). This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.
**This is our Large model based on ModernBERT-large**
## Model Details
- **Architecture:** ModernBERT (Large) with extended context support (up to 8192 tokens)
- **Task:** Token Classification / Hallucination Detection
- **Training Dataset:** RagTruth
- **Language:** English
## How It Works
The model is trained to identify tokens in the answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated.
## Usage
### Installation
Install the 'lettucedetect' repository
```bash
pip install lettucedetect
```
### Using the model
```python
from lettucedetect.models.inference import HallucinationDetector
# For a transformer-based approach:
detector = HallucinationDetector(
method="transformer", model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1"
)
contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."
# Get span-level predictions indicating which parts of the answer are considered hallucinated.
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)
# Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9944414496421814, 'text': ' The population of France is 69 million.'}]
```
## Details
We evaluate our model on the test set of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. We evaluate both example-level (can we detect that a given answer contains hallucinations) and span-level (can we detect which parts of the answer are hallucinated).
The results on the example-level can be seen in the table below.
Our model consistently achieves the highest scores across all data types and overall (**lettucedetect-large-v1**). We beat the previous best model (Finetuned LLAMA-2-13B) while being significantly smaller and faster (our models are 150M and 396M parameters, respectively, and able to process 30-60 examples per second on a single A100 GPU).
The other non-prompt based model is [Luna](https://aclanthology.org/2025.coling-industry.34.pdf) which is also a token-level model but uses a DeBERTA-large encoder model. Our model is overall better that the Luna architecture (65.4 vs 79.22 F1 score on the _overall_ data type).
The span-level results can be seen in the table below.
Our model achieves the best scores throughout each data-type and also overall, beating the previous best model (Finetuned LLAMA-2-13B) by a significant margin.
## Citing
If you use the model or the tool, please cite the following:
```bibtex
@software{Kovacs:2025,
author = {Kovacs, Adam},
title = {LettuceDetect},
month = feb,
year = 2025,
publisher = {Zenodo},
doi = {10.5281/zenodo.14856505},
url = {https://doi.org/10.5281/zenodo.14856505},
}
```