File size: 3,805 Bytes
59f5094 f33ca01 59f5094 d9fcac1 59f5094 ec1878a 81abf6c 2b09f78 81abf6c ec1878a d9fcac1 81abf6c d9fcac1 59f5094 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
license: mit
language:
- en
metrics:
- f1
- accuracy
pipeline_tag: text-classification
tags:
- social science
- covid
widget:
- text: We consistently found that participants selectively chose to learn that bad (good) things happened to bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.
---
# SCORE Claim Identification
This is a model card for detecting claims from an abstract of social science publications.
The model takes an abstract, performs sentence tokenization, and predict a claim probability of each sentence.
This model card is released by training on a [SCORE](https://www.cos.io/score) dataset.
It achieves the following results on the test set:
- Accuracy: 0.931597
- Precision: 0.764563
- Recall: 0.722477
- F1: 0.742925
## Model Usage
You can access the model with huggingface's `transformers` as follows:
```py
import spacy
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
nlp = spacy.load("en_core_web_lg")
model_name = "biodatlab/score-claim-identification"
tokenizer_name = "allenai/scibert_scivocab_uncased"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def inference(abstract: str):
"""
Split an abstract into sentences and perform claim identification.
"""
if abstract.strip() == "":
return "Please provide an abstract as an input."
claims = []
sents = [sent.text for sent in nlp(abstract).sents] # a list of sentences
inputs = tokenizer(
sents,
return_tensors="pt",
truncation=True,
padding="longest"
)
logits = model(**inputs).logits
preds = logits.argmax(dim=1) # convert logits to predictions
claims = [sent for sent, pred in zip(sents, preds) if pred == 1]
if len(claims) > 0:
return ".\n".join(claims)
else:
return "No claims found from a given abstract."
claims = inference(abstract) # string of claim joining with \n
```
## Intended usage
Takes in a statement and classifies as Claim (1) or Null (0).
Here are some examples -
| Statement | Label |
|:------------------------------------------------------------------------------------------------------------:|:----------:|
|We consistently found that participants selectively chose to learn that bad (good) things happened to <br>bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.| 1 (Claim) |
|Members of higher status groups generalize characteristics of their ingroup to superordinate categories<br> that serve as a frame of reference for comparisons with outgroups (ingroup projection).| 0 (Null) |
|Motivational Interviewing helped the goal progress of those participants who, at pre-screening, reported<br> engaging in many individual pro-environmental behaviors, but the more directive approach <br> worked better for those participants who were less ready to change.| 1 (Claim) |
## Training procedure
### Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 32
- eval_batch_size: 32
- n_epochs: 6
### Training results
| Training Loss | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
|:-------------:|:----:|:---------------:|:--------:|:--------:|:---------:|:--------:|
| 0.038000 | 3996 | 0.007086 | 0.997964 | 0.993499 | 0.995656 | 0.991350 |
### Framework versions
- transformers 4.28.0
- sentence-transformers 2.2.2
- accelerate 0.19.0
- datasets 2.12.0
- spacy 3.5.3
See more on `gradio` application in `biodatlab` space. |