titipata commited on
Commit
59f5094
1 Parent(s): f2cae39

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - f1
7
+ - accuracy
8
+ pipeline_tag: text-classification
9
+ tags:
10
+ - social science
11
+ - covid
12
+ ---
13
+
14
+ # SCORE Claim Identification
15
+
16
+ This is a model card for detecting claims from an abstract of social science publications.
17
+ The model takes an abstract, performs sentence tokenization, and predict a claim probability of each sentence.
18
+ This model card is released by training on a [SCORE](https://www.cos.io/score) dataset.
19
+
20
+ ```py
21
+ import spacy
22
+ from transformers import AutoTokenizer
23
+ from transformers import AutoModelForSequenceClassification
24
+
25
+ nlp = spacy.load("en_core_web_lg")
26
+ model_name = "biodatlab/score-claim-identification"
27
+ tokenizer_name = "allenai/scibert_scivocab_uncased"
28
+
29
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
30
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
31
+
32
+ def inference(abstract: str):
33
+ """
34
+ Split an abstract into sentences and perform claim identification.
35
+ """
36
+ if abstract.strip() == "":
37
+ return "Please provide an abstract as an input."
38
+ claims = []
39
+ sents = [sent.text for sent in nlp(abstract).sents] # a list of sentences
40
+ inputs = tokenizer(
41
+ sents,
42
+ return_tensors="pt",
43
+ truncation=True,
44
+ padding="longest"
45
+ )
46
+ logits = model(**inputs).logits
47
+ preds = logits.argmax(dim=1) # convert logits to predictions
48
+ claims = [sent for sent, pred in zip(sents, preds) if pred == 1]
49
+ if len(claims) > 0:
50
+ return ".\n".join(claims)
51
+ else:
52
+ return "No claims found from a given abstract."
53
+
54
+ claims = inference(abstract) # string of claim joining with \n
55
+ ```
56
+
57
+ See more on `gradio` application in `biodatlab` space.