--- license: mit language: - en metrics: - f1 - accuracy pipeline_tag: text-classification tags: - social science - covid --- # SCORE Claim Identification This is a model card for detecting claims from an abstract of social science publications. The model takes an abstract, performs sentence tokenization, and predict a claim probability of each sentence. This model card is released by training on a [SCORE](https://www.cos.io/score) dataset. It achieves the following results on the test set: - Accuracy: 0.931597 - Precision: 0.764563 - Recall: 0.722477 - F1: 0.742925 ## Model Usage You can access the model with huggingface's `transformers` as follows: ```py import spacy from transformers import AutoTokenizer from transformers import AutoModelForSequenceClassification nlp = spacy.load("en_core_web_lg") model_name = "biodatlab/score-claim-identification" tokenizer_name = "allenai/scibert_scivocab_uncased" tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) def inference(abstract: str): """ Split an abstract into sentences and perform claim identification. """ if abstract.strip() == "": return "Please provide an abstract as an input." claims = [] sents = [sent.text for sent in nlp(abstract).sents] # a list of sentences inputs = tokenizer( sents, return_tensors="pt", truncation=True, padding="longest" ) logits = model(**inputs).logits preds = logits.argmax(dim=1) # convert logits to predictions claims = [sent for sent, pred in zip(sents, preds) if pred == 1] if len(claims) > 0: return ".\n".join(claims) else: return "No claims found from a given abstract." claims = inference(abstract) # string of claim joining with \n ``` ## Intended usage Takes in a statement and classifies as Claim (1) or Null (0). Here are some examples - | Statement | Label | |:------------------------------------------------------------------------------------------------------------:|:----------:| |We consistently found that participants selectively chose to learn that bad (good) things happened to
bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.| 1 (Claim) | |Members of higher status groups generalize characteristics of their ingroup to superordinate categories
that serve as a frame of reference for comparisons with outgroups (ingroup projection).| 0 (Null) | |Motivational Interviewing helped the goal progress of those participants who, at pre-screening, reported
engaging in many individual pro-environmental behaviors, but the more directive approach
worked better for those participants who were less ready to change.| 1 (Claim) | ## Training procedure ### Training Hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 32 - eval_batch_size: 32 - n_epochs: 6 ### Training results | Training Loss | Step | Validation Loss | Accuracy | F1 | Precision | Recall | |:-------------:|:----:|:---------------:|:--------:|:--------:|:---------:|:--------:| | 0.038000 | 3996 | 0.007086 | 0.997964 | 0.993499 | 0.995656 | 0.991350 | ### Framework versions - transformers 4.28.0 - sentence-transformers 2.2.2 - accelerate 0.19.0 - datasets 2.12.0 - spacy 3.5.3 See more on `gradio` application in `biodatlab` space.