metadata
language: vi
datasets:
- cc100
tags:
- summarization
- translation
- question-answering
license: mit
EnViT5-base
State-of-the-art pretrained Transformer-based encoder-decoder model for Vietnamese and English used in MTet's paper.
How to use
For more details, do check out our Github repo.
Finetunning examples can be found here.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("VietAI/envit5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("VietAI/envit5-base")
model.cuda()
Citation
@misc{mtet,
doi = {10.48550/ARXIV.2210.05610},
url = {https://arxiv.org/abs/2210.05610},
author = {Ngo, Chinh and Trinh, Trieu H. and Phan, Long and Tran, Hieu and Dang, Tai and Nguyen, Hieu and Nguyen, Minh and Luong, Minh-Thang},
keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {MTet: Multi-domain Translation for English and Vietnamese},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}