File size: 1,568 Bytes
4fdbf41
 
 
 
 
eaf42ad
4fdbf41
eaf42ad
4fdbf41
eaf42ad
4fdbf41
82f5887
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
library_name: transformers
tags: []
---

# Malay Parler TTS Mini V1

Finetuned https://huggingface.co/parler-tts/parler-tts-mini-v1 on Malay TTS dataset https://huggingface.co/datasets/mesolitica/tts-combine-annotated

Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/parler-tts

Wandb at https://wandb.ai/huseinzol05/parler-speech?nw=nwuserhuseinzol05

## how-to

```python
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("mesolitica/malay-parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("mesolitica/malay-parler-tts-mini-v1")

speakers = [
    'Yasmin',
    'Osman',
    'Bunga',
    'Ariff',
    'Ayu',
    'Kamarul',
    'Danial',
    'Elina',
]

prompt = 'Husein zolkepli sangat comel dan kacak suka makan cendol'

for s in speakers:
    description = f"{s}'s voice, delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."

    input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
    prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

    generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
    audio_arr = generation.cpu()
    sf.write(f'{s}.mp3', audio_arr.numpy().squeeze(), 44100)
```