mesolitica
/

VITS-shafiqah-idayu

Model card Files Files and versions Community

huseinzol05 commited on 1 day ago

Commit

ef67169

·

verified ·

1 Parent(s): 60cce0e

Create README.md

Files changed (1) hide show

README.md +69 -0

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+language:
+- ms
+datasets:
+- mesolitica/TTS
+---
+# Malay VITS Shafiqah Idayu
+**This model intended to use by [malaya-speech](https://github.com/mesolitica/malaya-speech) only, it is possible to not use the library but make sure the character vocabulary is correct**.
+## requirements
+You need to install specific malaya-speech version to get better generation,
+```bash
+pip3 install git+https://github.com/mesolitica/malaya-speech@1d5a33dd119f32e793d539ce782f1fe37818af75 malaya
+```
+## how to
+```python
+from huggingface_hub import snapshot_download
+from malaya_speech.torch_model.vits.model_infer import SynthesizerTrn
+from malaya_speech.torch_model.vits.commons import intersperse
+from malaya_speech.utils.text import TTS_SYMBOLS
+from malaya_speech.tts import load_text_ids
+import torch
+import os
+import json
+try:
+    from malaya_boilerplate.hparams import HParams
+except BaseException:
+    from malaya_boilerplate.train.config import HParams
+folder = snapshot_download(repo_id="mesolitica/VITS-shafiqah-idayu")
+with open(os.path.join(folder, 'config.json')) as fopen:
+    hps = HParams(**json.load(fopen))
+model = SynthesizerTrn(
+    len(TTS_SYMBOLS),
+    hps.data.filter_length // 2 + 1,
+    hps.train.segment_size // hps.data.hop_length,
+    n_speakers=hps.data.n_speakers,
+    **hps.model,
+).eval()
+model.load_state_dict(torch.load(os.path.join(folder, 'model.pth'), map_location='cpu'))
+normalizer = load_text_ids(pad_to = None, understand_punct = True, is_lower = False)
+t, ids = normalizer.normalize('saya nak makan nasi ayam yang sedap, lagi lazat, dan hidup sangatlah susah kan.', add_fullstop = False)
+if hps.data.add_blank:
+    ids = intersperse(ids, 0)
+ids = torch.LongTensor(ids)
+ids_lengths = torch.LongTensor([ids.size(0)])
+ids = ids.unsqueeze(0)
+with torch.no_grad():
+    audio = model.infer(
+        ids,
+        ids_lengths,
+        noise_scale=0.0,
+        noise_scale_w=0.0,
+        length_scale=1.0,
+    )
+    y_ = audio[0].numpy()
+```