update readme

8db81b3 9 days ago

3.89 kB

	---
	language:
	- ru
	- zh
	- en
	tags:
	- translation
	license: apache-2.0
	datasets:
	- ccmatrix
	base_model:
	- utrobinmv/t5_translate_en_ru_zh_large_1024
	metrics:
	- sacrebleu
	widget:
	- example_title: translate zh-ru
	text: >
	translate to ru: 开发的目的是为用户提供个人同步翻译。
	- example_title: translate ru-en
	text: >
	translate to en: Цель разработки — предоставить пользователям личного синхронного переводчика.
	- example_title: translate en-ru
	text: >
	translate to ru: The purpose of the development is to provide users with a personal synchronized interpreter.
	- example_title: translate en-zh
	text: >
	translate to zh: The purpose of the development is to provide users with a personal synchronized interpreter.
	- example_title: translate zh-en
	text: >
	translate to en: 开发的目的是为用户提供个人同步解释器。
	- example_title: translate ru-zh
	text: >
	translate to zh: Цель разработки — предоставить пользователям личного синхронного переводчика.
	---

	# T5 English, Russian and Chinese multilingual machine translation

	This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en.

	The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual.



	Fine tune from the base model: utrobinmv/t5_translate_en_ru_zh_large_1024

	This version of the model was based on noisier data with a noise reduction function.
	The model can additionally insert punctuation marks into sentences if they are missing from the source text. This is convenient to use for translating texts after ASR models.

	The model has learned how to translate small markdown files while maintaining the markup and html tags.



	Example translate Russian to Chinese

	```python
	from transformers import T5ForConditionalGeneration, T5Tokenizer

	device = 'cuda' #or 'cpu' for translate on cpu

	model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024_v2'
	model = T5ForConditionalGeneration.from_pretrained(model_name)
	model.eval()
	model.to(device)
	tokenizer = T5Tokenizer.from_pretrained(model_name)

	prefix = 'translate to zh: '
	src_text = prefix + "Съешь ещё этих мягких французских булок."

	# translate Russian to Chinese
	input_ids = tokenizer(src_text, return_tensors="pt")

	generated_tokens = model.generate(**input_ids.to(device))

	result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
	print(result)
	# 再吃这些法国的甜蜜的面包。
	```



	and Example translate Chinese to Russian

	```python
	from transformers import T5ForConditionalGeneration, T5Tokenizer

	device = 'cuda' #or 'cpu' for translate on cpu

	model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024_v2'
	model = T5ForConditionalGeneration.from_pretrained(model_name)
	model.eval()
	model.to(device)
	tokenizer = T5Tokenizer.from_pretrained(model_name)

	prefix = 'translate to ru: '
	src_text = prefix + "再吃这些法国的甜蜜的面包。"

	# translate Russian to Chinese
	input_ids = tokenizer(src_text, return_tensors="pt")

	generated_tokens = model.generate(**input_ids.to(device))

	result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
	print(result)
	# Съешьте этот сладкий хлеб из Франции.
	```



	##



	## Languages covered

	Russian (ru_RU), Chinese (zh_CN), English (en_US)