File size: 3,805 Bytes
59f5094
 
 
 
 
 
 
 
 
 
 
f33ca01
 
59f5094
 
 
 
 
 
 
d9fcac1
 
 
 
 
 
 
 
 
59f5094
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec1878a
 
 
 
81abf6c
 
 
2b09f78
81abf6c
ec1878a
 
d9fcac1
 
 
 
 
 
81abf6c
 
 
 
d9fcac1
 
 
 
 
 
 
 
 
 
 
 
 
 
59f5094
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: mit
language:
- en
metrics:
- f1
- accuracy
pipeline_tag: text-classification
tags:
- social science
- covid
widget:
- text: We consistently found that participants selectively chose to learn that bad (good) things happened to bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.
---

# SCORE Claim Identification

This is a model card for detecting claims from an abstract of social science publications.
The model takes an abstract, performs sentence tokenization, and predict a claim probability of each sentence.
This model card is released by training on a [SCORE](https://www.cos.io/score) dataset.
It achieves the following results on the test set:

- Accuracy: 0.931597
- Precision: 0.764563
- Recall: 0.722477
- F1: 0.742925

## Model Usage
You can access the model with huggingface's `transformers` as follows:

```py
import spacy
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

nlp = spacy.load("en_core_web_lg")
model_name = "biodatlab/score-claim-identification"
tokenizer_name = "allenai/scibert_scivocab_uncased"

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def inference(abstract: str):
    """
    Split an abstract into sentences and perform claim identification.
    """
    if abstract.strip() == "":
        return "Please provide an abstract as an input."
    claims = []
    sents = [sent.text for sent in nlp(abstract).sents]  # a list of sentences
    inputs = tokenizer(
        sents,
        return_tensors="pt",
        truncation=True,
        padding="longest"
    )
    logits = model(**inputs).logits
    preds = logits.argmax(dim=1)  # convert logits to predictions
    claims = [sent for sent, pred in zip(sents, preds) if pred == 1]
    if len(claims) > 0:
        return ".\n".join(claims)
    else:
        return "No claims found from a given abstract."

claims = inference(abstract)  # string of claim joining with \n
```

## Intended usage
Takes in a statement and classifies as Claim (1) or Null (0).
Here are some examples - 

|                                                         Statement                                            |    Label   | 
|:------------------------------------------------------------------------------------------------------------:|:----------:|
|We consistently found that participants selectively chose to learn that bad (good) things happened to <br>bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.| 1 (Claim) |
|Members of higher status groups generalize characteristics of their ingroup to superordinate categories<br> that serve as a frame of reference for comparisons with outgroups (ingroup projection).| 0 (Null)  |
|Motivational Interviewing helped the goal progress of those participants who, at pre-screening, reported<br> engaging in many individual pro-environmental behaviors, but the more directive approach <br> worked better for those participants who were less ready to change.| 1 (Claim) |


## Training procedure

### Training Hyperparameters

The following hyperparameters were used during training:

- learning_rate: 3e-05
- train_batch_size: 32
- eval_batch_size: 32
- n_epochs: 6

### Training results

| Training Loss | Step | Validation Loss | Accuracy |    F1    | Precision |  Recall  |
|:-------------:|:----:|:---------------:|:--------:|:--------:|:---------:|:--------:|
| 0.038000      | 3996 | 0.007086        | 0.997964 | 0.993499 | 0.995656  | 0.991350 |

### Framework versions
- transformers 4.28.0
- sentence-transformers 2.2.2
- accelerate 0.19.0
- datasets 2.12.0
- spacy 3.5.3

See more on `gradio` application in `biodatlab` space.