Emmytheo commited on
Commit
94248d8
·
1 Parent(s): 55a20b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -9
README.md CHANGED
@@ -6,29 +6,46 @@ tags:
6
  - clinical
7
  thumbnail: https://core.app.datexis.com/static/paper.png
8
  pipeline_tag: text-classification
 
 
9
  ---
10
 
11
- # CORe Model - BioBERT + Clinical Outcome Pre-Training
12
 
13
  ## Model description
14
 
15
  The CORe (_Clinical Outcome Representations_) model is introduced in the paper [Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration](https://www.aclweb.org/anthology/2021.eacl-main.75.pdf).
16
  It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised _Clinical Outcome Pre-Training_ objective.
17
 
18
- #### How to use CORe
 
 
 
 
 
 
19
 
20
  You can load the model via the transformers library:
21
  ```
22
- from transformers import AutoTokenizer, AutoModel
23
- tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-outcome-biobert-v1")
24
- model = AutoModel.from_pretrained("bvanaken/CORe-clinical-outcome-biobert-v1")
25
  ```
26
- From there, you can fine-tune it on clinical tasks that benefit from patient outcome knowledge.
27
 
28
- ### Pre-Training Data
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- The model is based on [BioBERT](https://huggingface.co/dmis-lab/biobert-v1.1) pre-trained on PubMed data.
31
- The _Clinical Outcome Pre-Training_ included discharge summaries from the MIMIC III training set (specified [here](https://github.com/bvanaken/clinical-outcome-prediction/blob/master/tasks/mimic_train.csv)), medical transcriptions from [MTSamples](https://mtsamples.com/) and clinical notes from the i2b2 challenges 2006-2012. It further includes ~10k case reports from PubMed Central (PMC), disease articles from Wikipedia and article sections from the [MedQuAd](https://github.com/abachaa/MedQuAD) dataset extracted from NIH websites.
32
 
33
  ### More Information
34
 
 
6
  - clinical
7
  thumbnail: https://core.app.datexis.com/static/paper.png
8
  pipeline_tag: text-classification
9
+ widget:
10
+ - text: "Patient with hypertension presents to ICU."
11
  ---
12
 
13
+ # CORe Model - Clinical Diagnosis Prediction
14
 
15
  ## Model description
16
 
17
  The CORe (_Clinical Outcome Representations_) model is introduced in the paper [Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration](https://www.aclweb.org/anthology/2021.eacl-main.75.pdf).
18
  It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised _Clinical Outcome Pre-Training_ objective.
19
 
20
+ This model checkpoint is **fine-tuned on the task of diagnosis prediction**.
21
+ The model expects patient admission notes as input and outputs multi-label ICD9-code predictions.
22
+
23
+ #### Model Predictions
24
+ The model makes predictions on a total of 9237 labels. These contain 3- and 4-digit ICD9 codes and textual descriptions of these codes. The 4-digit codes and textual descriptions help to incorporate further topical and hierarchical information into the model during training (see Section 4.2 _ICD+: Incorporation of ICD Hierarchy_ in our paper). We recommend to only use the **3-digit code predictions at inference time**, because only those have been evaluated in our work.
25
+
26
+ #### How to use CORe Diagnosis Prediction
27
 
28
  You can load the model via the transformers library:
29
  ```
30
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
31
+ tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
32
+ model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
33
  ```
 
34
 
35
+ The following code shows an inference example:
36
+
37
+ ```
38
+ input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."
39
+
40
+ tokenized_input = tokenizer(input, return_tensors="pt")
41
+ output = model(**tokenized_input)
42
+
43
+ import torch
44
+ predictions = torch.sigmoid(output.logits)
45
+ predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]
46
+ ```
47
+ Note: For the best performance, we recommend to determine the thresholds (0.3 in this example) individually per label.
48
 
 
 
49
 
50
  ### More Information
51