File size: 1,498 Bytes
4ada751 1f3ff1b 4ada751 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
language: ky
datasets:
- wikiann
examples:
widget:
- text: "Бириккен Улуттар Уюму"
example_title: "Sentence_1"
- text: "Жусуп Мамай"
example_title: "Sentence_2"
---
<h1>Kyrgyz Named Entity Recognition</h1>
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language.
WARNING: this model is not usable (see metrics below) and is built just as a proof of concept.
I'll update the model after cleaning up the Wikiann dataset (`ky` part of it which contains only 100 train/test/valid items) or coming up with a completely new dataset.
## Label ID and its corresponding label name
| Label ID | Label Name|
| -------- | ----- |
| 0 | O |
| 1 | B-PER |
| 2 | I-PER |
| 3 | B-ORG|
| 4 | I-ORG |
| 5 | B-LOC |
| 6 | I-LOC |
<h1>Results</h1>
| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
| ---- | -------- | ----- | ---- | ---- |
| Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 |
| Validation set | 0.461333 | 0.551181 | 0.401913 | 0.425087 |
| Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 |
Example
```py
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER")
model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Жусуп Мамай"
ner_results = nlp(example)
ner_results
```
|