Update README.md
Browse files
README.md
CHANGED
@@ -6,29 +6,46 @@ tags:
|
|
6 |
- clinical
|
7 |
thumbnail: https://core.app.datexis.com/static/paper.png
|
8 |
pipeline_tag: text-classification
|
|
|
|
|
9 |
---
|
10 |
|
11 |
-
# CORe Model -
|
12 |
|
13 |
## Model description
|
14 |
|
15 |
The CORe (_Clinical Outcome Representations_) model is introduced in the paper [Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration](https://www.aclweb.org/anthology/2021.eacl-main.75.pdf).
|
16 |
It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised _Clinical Outcome Pre-Training_ objective.
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
You can load the model via the transformers library:
|
21 |
```
|
22 |
-
from transformers import AutoTokenizer,
|
23 |
-
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-
|
24 |
-
model =
|
25 |
```
|
26 |
-
From there, you can fine-tune it on clinical tasks that benefit from patient outcome knowledge.
|
27 |
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
-
The model is based on [BioBERT](https://huggingface.co/dmis-lab/biobert-v1.1) pre-trained on PubMed data.
|
31 |
-
The _Clinical Outcome Pre-Training_ included discharge summaries from the MIMIC III training set (specified [here](https://github.com/bvanaken/clinical-outcome-prediction/blob/master/tasks/mimic_train.csv)), medical transcriptions from [MTSamples](https://mtsamples.com/) and clinical notes from the i2b2 challenges 2006-2012. It further includes ~10k case reports from PubMed Central (PMC), disease articles from Wikipedia and article sections from the [MedQuAd](https://github.com/abachaa/MedQuAD) dataset extracted from NIH websites.
|
32 |
|
33 |
### More Information
|
34 |
|
|
|
6 |
- clinical
|
7 |
thumbnail: https://core.app.datexis.com/static/paper.png
|
8 |
pipeline_tag: text-classification
|
9 |
+
widget:
|
10 |
+
- text: "Patient with hypertension presents to ICU."
|
11 |
---
|
12 |
|
13 |
+
# CORe Model - Clinical Diagnosis Prediction
|
14 |
|
15 |
## Model description
|
16 |
|
17 |
The CORe (_Clinical Outcome Representations_) model is introduced in the paper [Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration](https://www.aclweb.org/anthology/2021.eacl-main.75.pdf).
|
18 |
It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised _Clinical Outcome Pre-Training_ objective.
|
19 |
|
20 |
+
This model checkpoint is **fine-tuned on the task of diagnosis prediction**.
|
21 |
+
The model expects patient admission notes as input and outputs multi-label ICD9-code predictions.
|
22 |
+
|
23 |
+
#### Model Predictions
|
24 |
+
The model makes predictions on a total of 9237 labels. These contain 3- and 4-digit ICD9 codes and textual descriptions of these codes. The 4-digit codes and textual descriptions help to incorporate further topical and hierarchical information into the model during training (see Section 4.2 _ICD+: Incorporation of ICD Hierarchy_ in our paper). We recommend to only use the **3-digit code predictions at inference time**, because only those have been evaluated in our work.
|
25 |
+
|
26 |
+
#### How to use CORe Diagnosis Prediction
|
27 |
|
28 |
You can load the model via the transformers library:
|
29 |
```
|
30 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
31 |
+
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
|
32 |
+
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
|
33 |
```
|
|
|
34 |
|
35 |
+
The following code shows an inference example:
|
36 |
+
|
37 |
+
```
|
38 |
+
input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."
|
39 |
+
|
40 |
+
tokenized_input = tokenizer(input, return_tensors="pt")
|
41 |
+
output = model(**tokenized_input)
|
42 |
+
|
43 |
+
import torch
|
44 |
+
predictions = torch.sigmoid(output.logits)
|
45 |
+
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]
|
46 |
+
```
|
47 |
+
Note: For the best performance, we recommend to determine the thresholds (0.3 in this example) individually per label.
|
48 |
|
|
|
|
|
49 |
|
50 |
### More Information
|
51 |
|