|
--- |
|
language: |
|
- ru |
|
- zh |
|
- en |
|
tags: |
|
- translation |
|
license: apache-2.0 |
|
datasets: |
|
- ccmatrix |
|
base_model: |
|
- utrobinmv/t5_translate_en_ru_zh_large_1024 |
|
metrics: |
|
- sacrebleu |
|
widget: |
|
- example_title: translate zh-ru |
|
text: > |
|
translate to ru: 开发的目的是为用户提供个人同步翻译。 |
|
- example_title: translate ru-en |
|
text: > |
|
translate to en: Цель разработки — предоставить пользователям личного синхронного переводчика. |
|
- example_title: translate en-ru |
|
text: > |
|
translate to ru: The purpose of the development is to provide users with a personal synchronized interpreter. |
|
- example_title: translate en-zh |
|
text: > |
|
translate to zh: The purpose of the development is to provide users with a personal synchronized interpreter. |
|
- example_title: translate zh-en |
|
text: > |
|
translate to en: 开发的目的是为用户提供个人同步解释器。 |
|
- example_title: translate ru-zh |
|
text: > |
|
translate to zh: Цель разработки — предоставить пользователям личного синхронного переводчика. |
|
--- |
|
|
|
# T5 English, Russian and Chinese multilingual machine translation |
|
|
|
This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en. |
|
|
|
The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual. |
|
|
|
|
|
|
|
Fine tune from the base model: utrobinmv/t5_translate_en_ru_zh_large_1024 |
|
|
|
This version of the model was based on noisier data with a noise reduction function. |
|
The model can additionally insert punctuation marks into sentences if they are missing from the source text. This is convenient to use for translating texts after ASR models. |
|
|
|
The model has learned how to translate small markdown files while maintaining the markup and html tags. |
|
|
|
|
|
|
|
Example translate Russian to Chinese |
|
|
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
device = 'cuda' #or 'cpu' for translate on cpu |
|
|
|
model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024_v2' |
|
model = T5ForConditionalGeneration.from_pretrained(model_name) |
|
model.eval() |
|
model.to(device) |
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
|
|
prefix = 'translate to zh: ' |
|
src_text = prefix + "Съешь ещё этих мягких французских булок." |
|
|
|
# translate Russian to Chinese |
|
input_ids = tokenizer(src_text, return_tensors="pt") |
|
|
|
generated_tokens = model.generate(**input_ids.to(device)) |
|
|
|
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) |
|
print(result) |
|
# 再吃这些法国的甜蜜的面包。 |
|
``` |
|
|
|
|
|
|
|
and Example translate Chinese to Russian |
|
|
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
device = 'cuda' #or 'cpu' for translate on cpu |
|
|
|
model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024_v2' |
|
model = T5ForConditionalGeneration.from_pretrained(model_name) |
|
model.eval() |
|
model.to(device) |
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
|
|
prefix = 'translate to ru: ' |
|
src_text = prefix + "再吃这些法国的甜蜜的面包。" |
|
|
|
# translate Russian to Chinese |
|
input_ids = tokenizer(src_text, return_tensors="pt") |
|
|
|
generated_tokens = model.generate(**input_ids.to(device)) |
|
|
|
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) |
|
print(result) |
|
# Съешьте этот сладкий хлеб из Франции. |
|
``` |
|
|
|
|
|
|
|
## |
|
|
|
|
|
|
|
## Languages covered |
|
|
|
Russian (ru_RU), Chinese (zh_CN), English (en_US) |
|
|