|
--- |
|
pipeline_tag: translation |
|
datasets: |
|
- ascolda/ru_en_Crystallography_and_Spectroscopy |
|
language: |
|
- ru |
|
- en |
|
metrics: |
|
- bleu |
|
tags: |
|
- chemistry |
|
--- |
|
# nllb-200-distilled-600M_ru_en_finetuned_crystallography |
|
|
|
This model is a fine-tuned version of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) trained on the [ascolda/ru_en_Crystallography_and_Spectroscopy](https://huggingface.co/datasets/ascolda/ru_en_Crystallography_and_Spectroscopy) dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.5602 |
|
- Bleu: 56.5855 |
|
|
|
## Model description |
|
|
|
The finetuned model yieled better performance on the machine translation task of domain-specific scientific articles related to the Crystallography and Spectroscopy domain. |
|
|
|
## Metrics used to describe the fine-tuning effect |
|
|
|
Below is the comparison of the translation quality metrics for the original NLLB model and my finetuned version. Evaluation is focused on: (1) general translation quality, (2) quality of translation of specific |
|
terminology, and (3) uniformity of translation of domain-specific terms in different contexts. |
|
|
|
(1) The general translation quality was evaluated using the Bleu metric. |
|
|
|
(2) Term Success Rate. In the terminology success rate we compared the machine-translated terms with their dictionary equivalents by checking for the presence of the reference terminology translation in the output by the regular expression match. |
|
|
|
(3) Term Consistency. This metric looks at whether technical terms are translated uniformly across the entire text corpus in different contexts. We aim for high consistency, |
|
measured by the low occurrence of multiple translations for the same term within the evaluation dataset. |
|
|
|
| Model | BLEU | Term Success Rate | Term Consistency | |
|
|:--------------------------------------------------------------:|:-------:|:-------------------:|:----------------:| |
|
| nllb-200-distilled-600M | 38.19 | 0.246 | 0.199 | |
|
| nllb-200-distilled-600M_ru_en_finetuned_crystallography | 56.59 | 0.573 | 0.740 | |