File size: 1,794 Bytes
5bbcd25 dc0922a 5bbcd25 dcb353a 5bbcd25 7c35ca2 5bbcd25 ffff809 5bbcd25 7c35ca2 e494125 cec94e0 a9c3364 e494125 a9c3364 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
language: tr
datasets:
- SUNLP-NER-Twitter
---
# berturk-sunlp-ner-turkish
## Introduction
[berturk-sunlp-ner-turkish] is a NER model that was fine-tuned from the BERTurk-cased model on the SUNLP-NER-Twitter dataset.
## Training data
The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset
Named entity types are as follows:
Person, Location, Organization, Time, Money, Product, TV-Show
## How to use berturk-sunlp-ner-turkish with HuggingFace
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
model = AutoModelForTokenClassification.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
```
## Model performances on SUNLP-NER-Twitter test set (metric: seqeval)
Precision|Recall|F1
-|-|-
82.96|82.42|82.69
Classification Report
Entity|Precision|Recall|F1
-|-|-|-
LOCATION|0.70|0.80|0.74
MONEY|0.80|0.71|0.75
ORGANIZATION|0.78|0.86|0.78
PERSON|0.90|0.91|0.91
PRODUCT|0.44|0.47|0.45
TIME|0.94|0.85|0.89
TVSHOW|0.61|0.35|0.45
You can cite the following [paper](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.484.pdf), if you use this model:
```bibtex
@InProceedings{ark-yeniterzi:2022:LREC,
author = {\c{C}ar\i k, Buse and Yeniterzi, Reyyan},
title = {A Twitter Corpus for Named Entity Recognition in Turkish},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
month = {June},
year = {2022},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {4546--4551},
url = {https://aclanthology.org/2022.lrec-1.484}
}
```
|