|
--- |
|
license: agpl-3.0 |
|
language: |
|
- de |
|
base_model: |
|
- deepset/gbert-base |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# MEDNER.DE: Medicinal Product Entity Recognition in German-Specific Contexts |
|
|
|
Released in December 2024, this is a German BERT language model further pretrained on `deepset/gbert-base` using a pharmacovigilance-related case summary corpus. The model has been fine-tuned for Named Entity Recognition (NER) tasks on an automatically annotated dataset to recognize medicinal products such as medications and vaccines. |
|
In our paper, we outline the steps taken to train this model and demonstrate its superior performance compared to previous approaches |
|
|
|
|
|
--- |
|
|
|
## Overview |
|
- **Paper**: [https://... |
|
- **Architecture**: MLM_based BERT Base |
|
- **Language**: German |
|
- **Supported Labels**: Medicinal Product |
|
|
|
|
|
**Model Name**: MEDNER.DE |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
### Use a pipeline as a high-level helper |
|
```python |
|
from transformers import pipeline |
|
|
|
# Load the NER pipeline |
|
model = pipeline("ner", model="pei-germany/MEDNER-de-fp-gbert", aggregation_strategy="none") |
|
|
|
# Input text |
|
text = "Der Patient wurde mit AstraZeneca geimpft und nahm anschließend Ibuprofen, um das Fieber zu senken." |
|
|
|
# Get raw predictions and merge subwords |
|
merged_predictions = [] |
|
current = None |
|
|
|
for pred in model(text): |
|
if pred['word'].startswith("##"): |
|
if current: |
|
current['word'] += pred['word'][2:] |
|
current['end'] = pred['end'] |
|
current['score'] = (current['score'] + pred['score']) / 2 |
|
else: |
|
if current: |
|
merged_predictions.append(current) |
|
current = pred.copy() |
|
|
|
if current: |
|
merged_predictions.append(current) |
|
|
|
# Filter by confidence threshold and print |
|
threshold = 0.5 |
|
filtered_predictions = [p for p in merged_predictions if p['score'] >= threshold] |
|
for p in filtered_predictions: |
|
print(f"Entity: {p['entity']}, Word: {p['word']}, Score: {p['score']:.2f}, Start: {p['start']}, End: {p['end']}") |
|
|
|
``` |
|
|
|
|
|
### Load model directly |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
import torch |
|
|
|
# Load model and tokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("pei-germany/MEDNER-de-fp-gbert") |
|
model = AutoModelForTokenClassification.from_pretrained("pei-germany/MEDNER-de-fp-gbert") |
|
|
|
text="Der Patient bekam den COVID-Impfstoff und nahm danach Aspirin." |
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model(**inputs) |
|
|
|
# Process logits and map predictions to labels |
|
predictions = [ |
|
(token, model.config.id2label[label.item()]) |
|
for token, label in zip( |
|
tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]), |
|
torch.argmax(torch.softmax(outputs.logits, dim=-1), dim=-1)[0] |
|
) |
|
if token not in tokenizer.all_special_tokens |
|
] |
|
|
|
print(predictions) |
|
``` |
|
--- |
|
# Authors |
|
Farnaz Zeidi, Manuela Messelhäußer, Roman Christof, Xing David Wang, Ulf Leser, Dirk Mentzer, Renate König, Liam Childs. |
|
|
|
|
|
--- |
|
|
|
## License |
|
This model is shared under the [GNU Affero General Public License v3.0 License](https://choosealicense.com/licenses/agpl-3.0/). |
|
|
|
|