File size: 2,246 Bytes
e799954 1c91fab e799954 1c91fab |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
license: mit
language:
- tr
- en
library_name: transformers
pipeline_tag: translation
---
# Model Card: nllb-3.3B-Turkish
## Version: Based on nllb-3.3B-Turkish: Further pretrained on a large corpus of English-to-Turkish dataset.
The training dataset for nllb-3.3B-Turkish consists of approximately 490,000 pairs of translated texts. These pairs are predominantly sourced from movie subtitles, offering a diverse range of linguistic structures, idiomatic expressions, and cultural references. This rich dataset ensures the model is well-equipped to handle a variety of translation tasks within its domain.
## Intended Use
nllb-3.3B-Turkish is designed for applications requiring English-to-Turkish translations, particularly in the context of subtitles. It is suitable for use in media localization, subtitling platforms, and language learning tools. The model can be utilized by developers, linguists, and content creators to facilitate seamless translation and enhance cross-cultural media accessibility.
## Model Training
Details regarding the model's training procedure, architecture, and fine-tuning processes will be extensively covered in the upcoming paper.
## Example Outputs
```
Question: What is the meaning of life? That was all- a simple question; one that tended to close in on one with years, the great revelation had never come. The great revelation perhaps never did come. Instead, there were little daily miracles, illuminations, matches struck unexpectedly in the dark; here was one.
Answer: Hayatın anlamı nedir? Bu basit bir soruydu. Yıllar geçtikçe insanın içine kapanmaya eğilimli olan bir soruydu. Büyük vahiy hiç gelmemişti. Büyük vahiy belki de hiç gelmemişti. Bunun yerine, küçük günlük mucizeler, aydınlatmalar, karanlıkta beklenmedik şekilde ateşler açılırdı. İşte bir tanesi.
```
```python
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, src_lang=src_lang, tgt_lang=tgt_lang)
translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang=src_lang, tgt_lang=tgt_lang, device_map="auto")
output = translator(question_prompt, max_length=512)[0]['translation_text']
```
|