Tilmash
Tilmash was fine-tuned using Facebook’s NLLB model to enable machine translation for four languages—Kazakh, Russian, English, and Turkish. Below are the BLEU | chrF results of evaluating Tilmash on the FLoRes and KazParC test datasets.
Pair | FLoRes | KazParC |
---|---|---|
EN↔KK | 0.20 | 0.60 | 0.21 | 0.60 |
EN↔RU | 0.28 | 0.60 | 0.38 | 0.68 |
EN↔TR | 0.27 | 0.65 | 0.25 | 0.64 |
KK↔EN | 0.32 | 0.63 | 0.32 | 0.62 |
KK↔RU | 0.18 | 0.52 | 0.29 | 0.63 |
KK↔TR | 0.14 | 0.54 | 0.16 | 0.55 |
RU↔EN | 0.32 | 0.63 | 0.42 | 0.70 |
RU↔KK | 0.13 | 0.54 | 0.22 | 0.62 |
RU↔TR | 0.14 | 0.54 | 0.18 | 0.57 |
TR↔EN | 0.36 | 0.66 | 0.38 | 0.66 |
TR↔KK | 0.13 | 0.54 | 0.16 | 0.55 |
TR↔RU | 0.19 | 0.53 | 0.24 | 0.57 |
Model Sources
- Repository: https://github.com/IS2AI/KazParC
- Paper: KazParC: Kazakh Parallel Corpus for Machine Translation
- Demo: Tilmash Demo
How to Get Started with the Model
You can use this model with the Transformers pipeline for translation.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TranslationPipeline
model = AutoModelForSeq2SeqLM.from_pretrained('issai/tilmash')
tokenizer = AutoTokenizer.from_pretrained("issai/tilmash")
# for src_lang and tgt_lang choose from kaz_Cyrl (Kazakh), rus_Cyrl (Russian), eng_Latn (English), tur_Latn (Turkish)
tilmash = TranslationPipeline(model = model, tokenizer = tokenizer, src_lang = "kaz_Cyrl", tgt_lang = "eng_Latn", max_length = 1000)
print(tilmash("Қазақстан — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет."))
# [{'translation_text': 'Kazakhstan is a country located in Eastern Europe and Central Asia.'}]
- Downloads last month
- 867