berturk-sunlp-ner-turkish
Introduction
[berturk-sunlp-ner-turkish] is a NER model that was fine-tuned from the BERTurk-cased model on the SUNLP-NER-Twitter dataset.
Training data
The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset Named entity types are as follows: Person, Location, Organization, Time, Money, Product, TV-Show
How to use berturk-sunlp-ner-turkish with HuggingFace
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
model = AutoModelForTokenClassification.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
Model performances on SUNLP-NER-Twitter test set (metric: seqeval)
Precision | Recall | F1 |
---|---|---|
82.96 | 82.42 | 82.69 |
Classification Report
Entity | Precision | Recall | F1 |
---|---|---|---|
LOCATION | 0.70 | 0.80 | 0.74 |
MONEY | 0.80 | 0.71 | 0.75 |
ORGANIZATION | 0.78 | 0.86 | 0.78 |
PERSON | 0.90 | 0.91 | 0.91 |
PRODUCT | 0.44 | 0.47 | 0.45 |
TIME | 0.94 | 0.85 | 0.89 |
TVSHOW | 0.61 | 0.35 | 0.45 |
You can cite the following paper, if you use this model:
@InProceedings{ark-yeniterzi:2022:LREC,
author = {\c{C}ar\i k, Buse and Yeniterzi, Reyyan},
title = {A Twitter Corpus for Named Entity Recognition in Turkish},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
month = {June},
year = {2022},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {4546--4551},
url = {https://aclanthology.org/2022.lrec-1.484}
}
- Downloads last month
- 419
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.