busecarik
/

berturk-sunlp-ner-turkish

Token Classification

Inference Endpoints

Model card Files Files and versions Community

berturk-sunlp-ner-turkish / README.md

busecarik's picture

Update README.md

cec94e0 almost 2 years ago

|

1.79 kB

	---
	language: tr
	datasets:
	- SUNLP-NER-Twitter
	---

	# berturk-sunlp-ner-turkish

	## Introduction
	[berturk-sunlp-ner-turkish] is a NER model that was fine-tuned from the BERTurk-cased model on the SUNLP-NER-Twitter dataset.

	## Training data
	The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset
	Named entity types are as follows:
	Person, Location, Organization, Time, Money, Product, TV-Show


	## How to use berturk-sunlp-ner-turkish with HuggingFace

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification

	tokenizer = AutoTokenizer.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
	model = AutoModelForTokenClassification.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
	```

	## Model performances on SUNLP-NER-Twitter test set (metric: seqeval)
	Precision\|Recall\|F1
	-\|-\|-
	82.96\|82.42\|82.69

	Classification Report

	Entity\|Precision\|Recall\|F1
	-\|-\|-\|-
	LOCATION\|0.70\|0.80\|0.74
	MONEY\|0.80\|0.71\|0.75
	ORGANIZATION\|0.78\|0.86\|0.78
	PERSON\|0.90\|0.91\|0.91
	PRODUCT\|0.44\|0.47\|0.45
	TIME\|0.94\|0.85\|0.89
	TVSHOW\|0.61\|0.35\|0.45


	You can cite the following [paper](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.484.pdf), if you use this model:

	```bibtex
	@InProceedings{ark-yeniterzi:2022:LREC,
	author = {\c{C}ar\i k, Buse and Yeniterzi, Reyyan},
	title = {A Twitter Corpus for Named Entity Recognition in Turkish},
	booktitle = {Proceedings of the Language Resources and Evaluation Conference},
	month = {June},
	year = {2022},
	address = {Marseille, France},
	publisher = {European Language Resources Association},
	pages = {4546--4551},
	url = {https://aclanthology.org/2022.lrec-1.484}
	}
	```