Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- lytang/LLM-AggreFact
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# Model Summary
|
10 |
+
|
11 |
+
This is a fact-checking model from the work ([GitHub Repo](https://github.com/Liyan06/MiniCheck)):
|
12 |
+
|
13 |
+
📃 **MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents** ([link](https://arxiv.org/pdf/2404.10774.pdf))
|
14 |
+
|
15 |
+
The model is based on Flan-T5-Large that predicts a binary label - 1 for supported and 0 for unsupported.
|
16 |
+
The model is doing predictions on the *sentence-level*. It takes as input a document and a sentence and determine
|
17 |
+
whether the sentence is supported by the document: **MiniCheck-Model(document, claim) -> {0, 1}**
|
18 |
+
|
19 |
+
|
20 |
+
MiniCheck-Flan-T5-Large is fine tuned from `google/flan-t5-large` ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416.pdf))
|
21 |
+
on the combination of 35K data:
|
22 |
+
- 21K ANLI data ([Nie et al., 2020](https://aclanthology.org/2020.acl-main.441.pdf))
|
23 |
+
- 14K synthetic data generated from scratch in a structed way (more details in the paper).
|
24 |
+
|
25 |
+
|
26 |
+
### Model Variants
|
27 |
+
We also have other two MiniCheck model variants:
|
28 |
+
- [lytang/MiniCheck-RoBERTa-Large](https://huggingface.co/lytang/MiniCheck-RoBERTa-Large)
|
29 |
+
- [lytang/MiniCheck-DeBERTa-v3-Large](https://huggingface.co/lytang/MiniCheck-DeBERTa-v3-Large)
|
30 |
+
|
31 |
+
|
32 |
+
### Model Performance
|
33 |
+
The performance of these model is evaluated on our new collected benchmark, [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
|
34 |
+
from 10 recent human annotated datasets on fact-checking and grounding LLM generations. Our most capable model MiniCheck-Flan-T5-Large outperform all
|
35 |
+
exisiting specialized fact-checkers with a similar scale by a large margin and is on par with GPT-4. See full results in our work.
|
36 |
+
|
37 |
+
# Model Usage Demo
|
38 |
+
|
39 |
+
|
40 |
+
|
41 |
+
|