--- language: - kk - tr - ru - en language_details: eng_Latn, kaz_Cyrl, rus_Cyrl, tur_Latn metrics: - bleu - chrf pipeline_tag: translation inference: false datasets: - facebook/flores - issai/kazparc --- # Tilmash
Tilmash was fine-tuned using Facebook’s NLLB model to enable machine translation for four languages—Kazakh, Russian, English, and Turkish. Below are the BLEU | chrF results of evaluating Tilmash on the FLoRes and KazParC test datasets.
Pair | FLoRes | KazParC |
---|---|---|
EN↔KK | 0.20 | 0.60 | 0.21 | 0.60 |
EN↔RU | 0.28 | 0.60 | 0.38 | 0.68 |
EN↔TR | 0.27 | 0.65 | 0.25 | 0.64 |
KK↔EN | 0.32 | 0.63 | 0.32 | 0.62 |
KK↔RU | 0.18 | 0.52 | 0.29 | 0.63 |
KK↔TR | 0.14 | 0.54 | 0.16 | 0.55 |
RU↔EN | 0.32 | 0.63 | 0.42 | 0.70 |
RU↔KK | 0.13 | 0.54 | 0.22 | 0.62 |
RU↔TR | 0.14 | 0.54 | 0.18 | 0.57 |
TR↔EN | 0.36 | 0.66 | 0.38 | 0.66 |
TR↔KK | 0.13 | 0.54 | 0.16 | 0.55 |
TR↔RU | 0.19 | 0.53 | 0.24 | 0.57 |
You can use this model with the Transformers pipeline for translation.
```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TranslationPipeline model = AutoModelForSeq2SeqLM.from_pretrained('issai/tilmash') tokenizer = AutoTokenizer.from_pretrained("issai/tilmash") # for src_lang and tgt_lang choose from kaz_Cyrl (Kazakh), rus_Cyrl (Russian), eng_Latn (English), tur_Latn (Turkish) tilmash = TranslationPipeline(model = model, tokenizer = tokenizer, src_lang = "kaz_Cyrl", tgt_lang = "eng_Latn", max_length = 1000) print(tilmash("Қазақстан — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет.")) # [{'translation_text': 'Kazakhstan is a country located in Eastern Europe and Central Asia.'}] ```