pszemraj's picture
add formatting to code blocks
fa8c79c
|
raw
history blame
3.28 kB
metadata
language:
  - en
datasets:
  - pubmed
metrics:
  - f1
tags:
  - text-classification
  - document sections
  - sentence classification
  - document classification
  - medical
  - health
  - biomedical
pipeline_tag: text-classification
widget:
  - text: >-
      many pathogenic processes and diseases are the result of an erroneous
      activation of the complement cascade and a number of inhibitors of
      complement have thus been examined for anti-inflammatory actions.
    example_title: background example
  - text: a total of 192 mi patients and 140 control persons were included.
    example_title: methods example
  - text: >-
      mi patients had 18 % higher plasma levels of map44 (iqr 11-25 %) as
      compared to the healthy control group (p < 0. 001.)
    example_title: results example
  - text: >-
      the finding that a brief cb group intervention delivered by real-world
      providers significantly reduced mdd onset relative to both brochure
      control and bibliotherapy is very encouraging, although effects on
      continuous outcome measures were small or nonsignificant and approximately
      half the magnitude of those found in efficacy research, potentially
      because the present sample reported lower initial depression.
    example_title: conclusions example
  - text: >-
      in order to understand and update the prevalence of myopia in taiwan, a
      nationwide survey was performed in 1995.
    example_title: objective example

BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext_pub_section

  • original model file name: textclassifer_BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext_pubmed_20k
  • This is a fine-tuned checkpoint of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext for document section text classification
  • possible document section classes are:BACKGROUND, CONCLUSIONS, METHODS, OBJECTIVE, RESULTS,

usage in python

install transformers as needed:

pip install -U transformers`

Run the following, changing the example text to your use case:

from transformers import pipeline

model_tag = "ml4pubmed/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext_pub_section"
classifier = pipeline(
              'text-classification', 
              model=model_tag, 
            )
            
prompt = """
Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
"""

classifier(
    prompt,
) # classify the sentence

metadata

training_metrics

  • val_accuracy: 0.8678670525550842

  • val_matthewscorrcoef: 0.8222037553787231

  • val_f1score: 0.866841197013855

  • val_cross_entropy: 0.3674609065055847

  • epoch: 8.0

  • train_accuracy_step: 0.83984375

  • train_matthewscorrcoef_step: 0.7790813446044922

  • train_f1score_step: 0.837363600730896

  • train_cross_entropy_step: 0.39843088388442993

  • train_accuracy_epoch: 0.8538406491279602

  • train_matthewscorrcoef_epoch: 0.8031334280967712

  • train_f1score_epoch: 0.8521654605865479

  • train_cross_entropy_epoch: 0.4116102457046509

  • test_accuracy: 0.8578397035598755

  • test_matthewscorrcoef: 0.8091378808021545

  • test_f1score: 0.8566917181015015

  • test_cross_entropy: 0.3963385224342346

  • date_run: Apr-22-2022_t-19

  • huggingface_tag: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext