--- language: tr datasets: - SUNLP-NER-Twitter --- # berturk-sunlp-ner-turkish ## Introduction [berturk-sunlp-ner-turkish] is a NER model that was fine-tuned from the BERTurk-cased model on the SUNLP-NER-Twitter dataset. ## Training data The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset Named entity types are as follows: Person, Location, Organization, Time, Money, Product, TV-Show ## How to use berturk-sunlp-ner-turkish with HuggingFace ```python from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("busecarik/berturk-sunlp-ner-turkish") model = AutoModelForTokenClassification.from_pretrained("busecarik/berturk-sunlp-ner-turkish") ``` ## Model performances on SUNLP-NER-Twitter test set (metric: seqeval) Precision|Recall|F1 -|-|- 82.96|82.42|82.69 Classification Report Entity|Precision|Recall|F1 -|-|-|- LOCATION|0.70|0.80|0.74 MONEY|0.80|0.71|0.75 ORGANIZATION|0.78|0.86|0.78 PERSON|0.90|0.91|0.91 PRODUCT|0.44|0.47|0.45 TIME|0.94|0.85|0.89 TVSHOW|0.61|0.35|0.45 ## You can cite the following paper, if you use this model: You can cite the following [paper](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.484.pdf), if you use this dataset: ```bibtex @InProceedings{ark-yeniterzi:2022:LREC, author = {\c{C}ar\i k, Buse and Yeniterzi, Reyyan}, title = {A Twitter Corpus for Named Entity Recognition in Turkish}, booktitle = {Proceedings of the Language Resources and Evaluation Conference}, month = {June}, year = {2022}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {4546--4551}, url = {https://aclanthology.org/2022.lrec-1.484} } ```