|
--- |
|
language: tr |
|
datasets: |
|
- SUNLP-NER-Twitter |
|
--- |
|
|
|
# berturk-sunlp-ner-turkish |
|
|
|
## Introduction |
|
[berturk-sunlp-ner-turkish] is a NER model that was fine-tuned from the BERTurk-cased model on the SUNLP-NER-Twitter dataset. |
|
|
|
## Training data |
|
The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset |
|
Named entity types are as follows: |
|
Person, Location, Organization, Time, Money, Product, TV-Show |
|
|
|
|
|
## How to use berturk-sunlp-ner-turkish with HuggingFace |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("busecarik/berturk-sunlp-ner-turkish") |
|
model = AutoModelForTokenClassification.from_pretrained("busecarik/berturk-sunlp-ner-turkish") |
|
``` |
|
|
|
## Model performances on SUNLP-NER-Twitter test set (metric: seqeval) |
|
Precision|Recall|F1 |
|
-|-|- |
|
82.96|82.42|82.69 |
|
|
|
Classification Report |
|
|
|
Entity|Precision|Recall|F1 |
|
-|-|-|- |
|
LOCATION|0.70|0.80|0.74 |
|
MONEY|0.80|0.71|0.75 |
|
ORGANIZATION|0.78|0.86|0.78 |
|
PERSON|0.90|0.91|0.91 |
|
PRODUCT|0.44|0.47|0.45 |
|
TIME|0.94|0.85|0.89 |
|
TVSHOW|0.61|0.35|0.45 |
|
|
|
|
|
You can cite the following [paper](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.484.pdf), if you use this model: |
|
|
|
```bibtex |
|
@InProceedings{ark-yeniterzi:2022:LREC, |
|
author = {\c{C}ar\i k, Buse and Yeniterzi, Reyyan}, |
|
title = {A Twitter Corpus for Named Entity Recognition in Turkish}, |
|
booktitle = {Proceedings of the Language Resources and Evaluation Conference}, |
|
month = {June}, |
|
year = {2022}, |
|
address = {Marseille, France}, |
|
publisher = {European Language Resources Association}, |
|
pages = {4546--4551}, |
|
url = {https://aclanthology.org/2022.lrec-1.484} |
|
} |
|
``` |
|
|