How to convert Marian model format to huggingface

by corner - opened Jun 22, 2022

Discussion

corner

Jun 22, 2022

I know that original “Helsinki-NLP/opus-mt-en-zh” was traned with MarianNMT，but how to use it in huggingface project？

patrickvonplaten

Jun 25, 2022

Hey @corner ,

Sorry I don't fully understand the question. To use the model you can simply follow this code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

And the docs here https://huggingface.co/docs/transformers/model_doc/marian

corner

Jun 26, 2022

Thank you for your reply，@patrickvonplaten ：
I mean if I train model with marian，how can i use it in huggingface project，like “Helsinki-NLP/opus-mt-en-zh” in：

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

bobosui

Jun 29, 2022

You may try the code below to convert the model to pytorch.

import argparse
import os
from pathlib import Path
from transformers.models.marian.convert_marian_to_pytorch import convert

argparser = argparse.ArgumentParser('Convert Marian NMT models to pyTorch')
argparser.add_argument('--model-path', action="store", required=True)
argparser.add_argument('--dest-path', action="store", required=True)
args = argparser.parse_args()

Path(args.dest_path).mkdir(parents=True, exist_ok=True)
convert(Path(args.model_path), Path(args.dest_path))

corner

Jul 1, 2022

THANKS bobosui
I'll try it

corner

Jul 8, 2022

hello,@bobosui
Do I have to use sentenceice to train the model? How else can I get source spm/target. spm
Thx!

tiedeman

Language Technology Research Group at the University of Helsinki org Jul 14, 2022

Yes, you have to use sentencepiece to get the subword segmentation models. You can probably use other subword tokenizers as well but then you have to dig into the model conversion code to make the appropriate adjustments.

corner

Jul 15, 2022

Thanks to you all！
I have done train with sentencepiece，and use convert_marian_to_pytorch.py to pytorch format ，the result is not bad！

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment