|
--- |
|
tags: |
|
- translation |
|
- japanese |
|
|
|
language: |
|
- ja |
|
- en |
|
|
|
license: mit |
|
|
|
widget: |
|
- text: "今日もご安全に" |
|
|
|
--- |
|
## mbart-ja-en |
|
このモデルは[facebook/mbart-large-cc25](https://huggingface.co/facebook/mbart-large-cc25)をベースに[JESC dataset](https://nlp.stanford.edu/projects/jesc/index_ja.html)でファインチューニングしたものです。 |
|
This model is based on [facebook/mbart-large-cc25](https://huggingface.co/facebook/mbart-large-cc25) and fine-tuned with [JESC dataset](https://nlp.stanford.edu/projects/jesc/index_ja.html). |
|
|
|
## How to use |
|
```py |
|
from transformers import ( |
|
MBartForConditionalGeneration, MBartTokenizer |
|
) |
|
|
|
tokenizer = MBartTokenizer.from_pretrained("ken11/mbart-ja-en") |
|
model = MBartForConditionalGeneration.from_pretrained("ken11/mbart-ja-en") |
|
|
|
inputs = tokenizer("こんにちは", return_tensors="pt") |
|
translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["en_XX"], early_stopping=True, max_length=48) |
|
pred = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] |
|
print(pred) |
|
``` |
|
|
|
## Training Data |
|
I used the [JESC dataset](https://nlp.stanford.edu/projects/jesc/index_ja.html) for training. |
|
Thank you for publishing such a large dataset. |
|
|
|
## Tokenizer |
|
The tokenizer uses the [sentencepiece](https://github.com/google/sentencepiece) trained on the JESC dataset. |
|
|
|
## Note |
|
The result of evaluating the sacrebleu score for [JEC Basic Sentence Data of Kyoto University](https://nlp.ist.i.kyoto-u.ac.jp/EN/?JEC+Basic+Sentence+Data#i0163896) was `18.18` . |
|
|
|
## Licenese |
|
[The MIT license](https://opensource.org/licenses/MIT) |
|
|