lytang
/

MiniCheck-Flan-T5-Large

Text Classification

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lytang commited on Apr 17, 2024

Commit

a932166

·

verified ·

1 Parent(s): 839ac51

Create README.md

Files changed (1) hide show

README.md +41 -0

README.md ADDED Viewed

	@@ -0,0 +1,41 @@

+---
+datasets:
+- lytang/LLM-AggreFact
+language:
+- en
+pipeline_tag: text-classification
+---
+# Model Summary
+This is a fact-checking model from the work ([GitHub Repo](https://github.com/Liyan06/MiniCheck)):
+📃 **MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents** ([link](https://arxiv.org/pdf/2404.10774.pdf))
+The model is based on Flan-T5-Large that predicts a binary label - 1 for supported and 0 for unsupported.
+The model is doing predictions on the *sentence-level*. It takes as input a document and a sentence and determine
+whether the sentence is supported by the document: **MiniCheck-Model(document, claim) -> {0, 1}**
+MiniCheck-Flan-T5-Large is fine tuned from `google/flan-t5-large` ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416.pdf))
+on the combination of 35K data:
+- 21K ANLI data ([Nie et al., 2020](https://aclanthology.org/2020.acl-main.441.pdf))
+- 14K synthetic data generated from scratch in a structed way (more details in the paper).
+### Model Variants
+We also have other two MiniCheck model variants:
+- [lytang/MiniCheck-RoBERTa-Large](https://huggingface.co/lytang/MiniCheck-RoBERTa-Large)
+- [lytang/MiniCheck-DeBERTa-v3-Large](https://huggingface.co/lytang/MiniCheck-DeBERTa-v3-Large)
+### Model Performance
+The performance of these model is evaluated on our new collected benchmark, [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
+from 10 recent human annotated datasets on fact-checking and grounding LLM generations. Our most capable model MiniCheck-Flan-T5-Large outperform all
+exisiting specialized fact-checkers with a similar scale by a large margin and is on par with GPT-4. See full results in our work.
+# Model Usage Demo