|
--- |
|
language: en |
|
tags: |
|
- veterinary |
|
- pets |
|
- classification |
|
- vetbert |
|
- BERT |
|
|
|
widget: |
|
- text: "Hx: 7 yo canine with history of vomiting intermittently since yesterday. No other concerns. Still eating and drinking normally. cPL negative." |
|
example_title: "Enteropathy" |
|
--- |
|
|
|
|
|
# VetBERT Disease Syndrome Classifier |
|
|
|
This is a finetuned version of the [VetBERT](https://huggingface.co/havocy28/VetBERT) model, designed to classify the disease syndrome within a veterinary clinical note. |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This pretrained model is designed for performing NLP tasks related to veterinary clinical notes. The [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp-1.17) (Hur et al., BioNLP 2020) paper introduced VetBERT model: an initialized Bert Model with ClinicalBERT (Bio+Clinical BERT) and further pretrained on the [VetCompass Australia](https://www.vetcompass.com.au/) corpus for performing tasks specific to veterinary medicine. |
|
|
|
## Pretraining Data |
|
|
|
The VetBERT model was initialized from [Bio_ClinicalBERT model](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT), which was initialized from BERT. The VetBERT model was trained on over 15 million veterinary clincal Records and 1.3 Billion tokens. |
|
|
|
## Pretraining Hyperparameters |
|
|
|
During the pretraining phase for VetBERT, we used a batch size of 32, a maximum sequence length of 512, and a learning rate of 5 · 10−5. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20). |
|
|
|
## VetBERT Finetuning |
|
|
|
VetBERT was further finetuned on a set of 5002 annotated clinical notes to classifiy the disease syndrome associated with the clinical notes as outlined in the paper: [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp-1.17) |
|
|
|
## How to use the model |
|
|
|
Load the model via the transformers library: |
|
|
|
``` |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
# Load the tokenizer and model from the Hugging Face Hub |
|
model_name = 'havocy28/VetBERTDx' |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
# Example text to classify |
|
text = "Hx: 7 yo canine with history of vomiting intermittently since yesterday. No other concerns. Still eating and drinking normally. cPL negative." |
|
|
|
# Encode the text and prepare inputs for the model |
|
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512) |
|
|
|
# Predict and compute softmax to get probabilities |
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
probabilities = torch.softmax(logits, dim=-1) |
|
|
|
# Retrieve label mapping from model's configuration |
|
label_map = model.config.id2label |
|
|
|
# Combine labels and probabilities, and sort by probability in descending order |
|
sorted_probs = sorted(((prob.item(), label_map[idx]) for idx, prob in enumerate(probabilities[0])), reverse=True, key=lambda x: x[0]) |
|
|
|
# Display sorted probabilities and labels |
|
for prob, label in sorted_probs: |
|
print(f"{label}: {prob:.4f}") |
|
``` |
|
|
|
## Citation |
|
|
|
Please cite this article: Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, and James Gilkerson. 2020. [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp-1.17). In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 156–166, Online. Association for Computational Linguistics. |
|
|
|
|