README.md · PeacefulData/GenTranslate at 192e10e033fef0ee371ecefeec9b9d87d88f7f44

metadata

license: apache-2.0
language:
  - en
  - zh
  - ja
  - fr
  - es
  - it
  - pt
tags:
  - generative translation
  - large language model
  - LLaMA
metrics:
  - bleu
pipeline_tag: text-generation
datasets:
  - PeacefulData/HypoTranslate

This repo releases the trained LLaMA-adapter weights in paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators".

GitHub: https://github.com/YUCHEN005/GenTranslate

Data: https://huggingface.co/datasets/PeacefulData/HypoTranslate

Model: This repo

Filename format: [data_source]_[src_language_code]_[tgt_language_code]_[task].pth

e.g. covost2_ar_en_st.pth

Note:

Language code look-up: Table 15 & 17 in https://arxiv.org/pdf/2402.06894.pdf
Source/target language refers to the translation task, so that the N-best hypotheses and ground-truth transcription are both in target language
For speech translation datasets (FLEURS, CoVoST-2, MuST-C), the task ID "mt" denotes cascaded ASR+MT pipeline

If you consider this work would be related or useful for your research, please kindly consider to cite the work below. Thank you.

@article{hu2024gentranslate,
  title={GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators},
  author={Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Dong and Chen, Zhehuai and Chng, Eng Siong},
  journal={arXiv preprint arXiv:2402.06894},
  year={2024}
}