--- license: mit language: - en base_model: - answerdotai/ModernBERT-base pipeline_tag: token-classification tags: - token classification - hallucination detection - transformers --- # LettuceDetect: Hallucination Detection Model

LettuceDetect Logo

**Model Name:** lettucedect-large-modernbert-en-v1 **Organization:** KRLabsOrg **Github:** https://github.com/KRLabsOrg/LettuceDetect ## Overview LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for Retrieval-Augmented Generation (RAG) applications. This model is built on **ModernBERT**, which has been specifically chosen and trained becasue of its extended context support (up to **8192 tokens**). This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context. **This is our Large model based on ModernBERT-large** ## Model Details - **Architecture:** ModernBERT (Large) with extended context support (up to 8192 tokens) - **Task:** Token Classification / Hallucination Detection - **Training Dataset:** RagTruth - **Language:** English ## How It Works The model is trained to identify tokens in the answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated. ## Usage ### Installation Install the 'lettucedetect' repository ```bash pip install lettucedetect ``` ### Using the model ```python from lettucedetect.models.inference import HallucinationDetector # For a transformer-based approach: detector = HallucinationDetector( method="transformer", model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1" ) contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",] question = "What is the capital of France? What is the population of France?" answer = "The capital of France is Paris. The population of France is 69 million." # Get span-level predictions indicating which parts of the answer are considered hallucinated. predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans") print("Predictions:", predictions) # Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9944414496421814, 'text': ' The population of France is 69 million.'}] ``` ## Details We evaluate our model on the test set of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. We evaluate both example-level (can we detect that a given answer contains hallucinations) and span-level (can we detect which parts of the answer are hallucinated). The results on the example-level can be seen in the table below.

Example-level Results

Our model consistently achieves the highest scores across all data types and overall (**lettucedetect-large-v1**). We beat the previous best model (Finetuned LLAMA-2-13B) while being significantly smaller and faster (our models are 150M and 396M parameters, respectively, and able to process 30-60 examples per second on a single A100 GPU). The other non-prompt based model is [Luna](https://aclanthology.org/2025.coling-industry.34.pdf) which is also a token-level model but uses a DeBERTA-large encoder model. Our model is overall better that the Luna architecture (65.4 vs 79.22 F1 score on the _overall_ data type). The span-level results can be seen in the table below.

Span-level Results

Our model achieves the best scores throughout each data-type and also overall, beating the previous best model (Finetuned LLAMA-2-13B) by a significant margin. ## Citing If you use the model or the tool, please cite the following: ```bibtex @software{Kovacs:2025, author = {Kovacs, Adam}, title = {LettuceDetect}, month = feb, year = 2025, publisher = {Zenodo}, doi = {10.5281/zenodo.14856505}, url = {https://doi.org/10.5281/zenodo.14856505}, } ```