ALMA-7B-Ja / README.md
dahara1's picture
Update README.md
41d1948
|
raw
history blame
3.2 kB
metadata
inference: false
language:
  - ja
  - en
  - de
  - is
  - zh
  - cs

webbigdata/ALMA-7B-Ja

Original ALMA Model ALMA-7B. (26.95GB)

ALMA-7B-Ja is a machine translation model that uses ALMA's learning method to translate Japanese to English.(13.3GB)
The original ALMA-7B supports English and Russian(ru) translation. This model supports Japanese(ja) and English translations instead of Russian.

Like the original model, This model has been verified that it also has a translation ability between the following languages, but if you want the translation function for these languages, it is better to use the original ALMA-13B model.

German(de) and English(en)
Chinese(zh) and English(en)
Icelandic(is) and English(en)
Czech(cs) and English(en)

Models de cs is zh ru/jp Avg.
Translating from English (en→xx) BLEU/COMET
NLLB-54B 34.50/86.45 37.60/90.15 24.15/81.76 27.38/78.91 30.96/87.92 30.92/85.04
GPT-3.5-D 31.80/85.61 31.30/88.57 15.90/76.28 38.30/85.76 27.50/86.74 28.96/84.59
ALMA-7B(Original) 30.31/85.59 29.88/89.10 25.71/85.52 36.87/85.11 27.13/86.98 29.89/86.49
ALMA-7B-Ja(Ours) 23.70/82.04 18.58/81.36 12.20/71.59 29.06/82.45 14.82/85.40 19.67/80.57
Translating to English (xx→en) BLEU/COMET
NLLB-54B 26.89/78.94 39.11/80.13 23.09/71.66 16.56/70.70 39.11/81.88 28.95/76.66
GPT-3.5-D 30.90/84.79 44.50/86.16 31.90/82.13 25.00/81.62 38.50/84.80 34.16/83.90
ALMA-7B(Original) 30.26/84.00 43.91/85.86 35.97/86.03 23.75/79.85 39.37/84.58 34.55/84.02
ALMA-7B-Ja(Ours) 26.41/83.13 34.39/83.50 24.77/81.12 20.60/78.54 15.57/78.61 24.35/81.76

Sample Code For Free Colab

There is also a GPTQ quantized version model that reduces model size(3.9GB) and memory usage, although the performance is probably lower.
And translation ability for languages other than Japanese and English has deteriorated significantly.
webbigdata/ALMA-7B-Ja-GPTQ-Ja-En

ALMA (Advanced Language Model-based trAnslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance. Please find more details in their paper.

@misc{xu2023paradigm,
      title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models}, 
      author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla},
      year={2023},
      eprint={2309.11674},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

about this work