|
--- |
|
license: mit |
|
datasets: |
|
- bigbio/chemdner |
|
- ncbi_disease |
|
- jnlpba |
|
- bigbio/n2c2_2018_track2 |
|
- bigbio/bc5cdr |
|
widget: |
|
- text: Drug<SEP>He was given aspirin and paracetamol. |
|
language: |
|
- en |
|
metrics: |
|
- precision |
|
- recall |
|
- f1 |
|
pipeline_tag: token-classification |
|
tags: |
|
- token-classification |
|
- biology |
|
- medical |
|
- zero-shot |
|
- few-shot |
|
library_name: transformers |
|
--- |
|
# Zero and few shot NER for biomedical texts |
|
|
|
## Model description |
|
Model takes as input two strings. String1 is NER label. String1 must be phrase for entity. String2 is short text where String1 is searched for semantically. |
|
model outputs list of zeros and ones corresponding to the occurance of Named Entity and corresponing to the tokens(tokens given by transformer tokenizer) of the Sring2. |
|
|
|
## Example of usage |
|
```python |
|
from transformers import AutoTokenizer |
|
from transformers import BertForTokenClassification |
|
|
|
modelname = 'ProdicusII/ZeroShotBioNER' # modelpath |
|
tokenizer = AutoTokenizer.from_pretrained(modelname) ## loading the tokenizer of that model |
|
string1 = 'Drug' |
|
string2 = 'No recent antibiotics or other nephrotoxins, and no symptoms of UTI with benign UA.' |
|
encodings = tokenizer(string1, string2, is_split_into_words=False, |
|
padding=True, truncation=True, add_special_tokens=True, return_offsets_mapping=False, |
|
max_length=512, return_tensors='pt') |
|
|
|
model = BertForTokenClassification.from_pretrained(modelname, num_labels=2) |
|
prediction_logits = model(**encodings) |
|
print(prediction_logits) |
|
``` |
|
|
|
## Available classes |
|
|
|
The following datasets and entities were used for training and therefore they can be used as label in the first segment (as a first string). Note that multiword string have been merged. |
|
|
|
|
|
* NCBI |
|
* Specific Disease |
|
* Composite Mention |
|
* Modifier |
|
* Disease Class |
|
* BIORED |
|
* Sequence Variant |
|
* Gene Or Gene Product |
|
* Disease Or Phenotypic Feature |
|
* Chemical Entity |
|
* Cell Line |
|
* Organism Taxon |
|
* CDR Disease |
|
* Chemical |
|
* CHEMDNER |
|
* Chemical |
|
* Chemical Family |
|
* JNLPBA |
|
* Protein |
|
* DNA |
|
* Cell Type |
|
* Cell Line |
|
* RNA |
|
* n2c2 |
|
* Drug |
|
* Frequency |
|
* Strength |
|
* Dosage |
|
* Form |
|
* Reason |
|
* Route |
|
* ADE |
|
* Duration |
|
|
|
On top of this, one can use the model in zero-shot regime with other classes, and also fine-tune it with few examples of other classes. |
|
|
|
|
|
|
|
## Code availibility |
|
|
|
Code used for training and testing the model is available at https://github.com/br-ai-ns-institute/Zero-ShotNER |
|
|
|
## Citation |