tts-thai-last-step / README.md
lunarlist's picture
Update README.md (#1)
de2b736
metadata
license: mit
datasets:
  - lunarlist/edited_common_voice
language:
  - th
library_name: nemo
pipeline_tag: text-to-speech

This model is a Thai TTS model that use a voice from Common Voice dataset and modify the voice to not to sound like the original.

pip install nemo_toolkit['tts'] soundfile

from nemo.collections.tts.models import UnivNetModel
from nemo.collections.tts.models import Tacotron2Model
import torch
import soundfile as sf

model = Tacotron2Model.from_pretrained("lunarlist/tts-thai-last-step").to('cpu')
vcoder_model = UnivNetModel.from_pretrained(model_name="tts_en_libritts_univnet")
text='ภาษาไทย ง่าย นิด เดียว'
dict_idx={k:i for i,k in enumerate(model.hparams["cfg"]['labels'])}
parsed2=torch.Tensor([[66]+[dict_idx[i] for i in text if i]+[67]]).int().to("cpu")
spectrogram2 = model.generate_spectrogram(tokens=parsed2)
audio2 = vcoder_model.convert_spectrogram_to_audio(spec=spectrogram2)

# Save the audio to disk in a file called speech.wav
sf.write("speech.wav", audio2.to('cpu').detach().numpy()[0], 22050)

Medium: Text-To-Speech ภาษาไทยด้วย Tacotron2