lytang commited on
Commit
a932166
·
verified ·
1 Parent(s): 839ac51

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - lytang/LLM-AggreFact
4
+ language:
5
+ - en
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # Model Summary
10
+
11
+ This is a fact-checking model from the work ([GitHub Repo](https://github.com/Liyan06/MiniCheck)):
12
+
13
+ 📃 **MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents** ([link](https://arxiv.org/pdf/2404.10774.pdf))
14
+
15
+ The model is based on Flan-T5-Large that predicts a binary label - 1 for supported and 0 for unsupported.
16
+ The model is doing predictions on the *sentence-level*. It takes as input a document and a sentence and determine
17
+ whether the sentence is supported by the document: **MiniCheck-Model(document, claim) -> {0, 1}**
18
+
19
+
20
+ MiniCheck-Flan-T5-Large is fine tuned from `google/flan-t5-large` ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416.pdf))
21
+ on the combination of 35K data:
22
+ - 21K ANLI data ([Nie et al., 2020](https://aclanthology.org/2020.acl-main.441.pdf))
23
+ - 14K synthetic data generated from scratch in a structed way (more details in the paper).
24
+
25
+
26
+ ### Model Variants
27
+ We also have other two MiniCheck model variants:
28
+ - [lytang/MiniCheck-RoBERTa-Large](https://huggingface.co/lytang/MiniCheck-RoBERTa-Large)
29
+ - [lytang/MiniCheck-DeBERTa-v3-Large](https://huggingface.co/lytang/MiniCheck-DeBERTa-v3-Large)
30
+
31
+
32
+ ### Model Performance
33
+ The performance of these model is evaluated on our new collected benchmark, [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
34
+ from 10 recent human annotated datasets on fact-checking and grounding LLM generations. Our most capable model MiniCheck-Flan-T5-Large outperform all
35
+ exisiting specialized fact-checkers with a similar scale by a large margin and is on par with GPT-4. See full results in our work.
36
+
37
+ # Model Usage Demo
38
+
39
+
40
+
41
+