tobiccino commited on
Commit
3e9af72
1 Parent(s): 12da6cc

fix readme

Browse files
Files changed (1) hide show
  1. README.md +0 -116
README.md CHANGED
@@ -1,116 +0,0 @@
1
- A Vietnamese TTS
2
- ================
3
-
4
- Duration model + Acoustic model + HiFiGAN vocoder for vietnamese text-to-speech application.
5
-
6
- Online demo at https://huggingface.co/spaces/ntt123/vietTTS.
7
-
8
- A synthesized audio clip: [clip.wav](assets/infore/clip.wav). A colab notebook: [notebook](https://colab.research.google.com/drive/1oczrWOQOr1Y_qLdgis1twSlNZlfPVXoY?usp=sharing).
9
-
10
-
11
- 🔔Checkout the experimental `multi-speaker` branch (`git checkout multi-speaker`) for multi-speaker support.🔔
12
-
13
- Install
14
- -------
15
-
16
-
17
- ```sh
18
- git clone https://github.com/NTT123/vietTTS.git
19
- cd vietTTS
20
- pip3 install -e .
21
- ```
22
-
23
-
24
- Quick start using pretrained models
25
- ----------------------------------
26
- ```sh
27
- bash ./scripts/quick_start.sh
28
- ```
29
-
30
-
31
- Download InfoRe dataset
32
- -----------------------
33
-
34
- ```sh
35
- python ./scripts/download_aligned_infore_dataset.py
36
- ```
37
-
38
- **Note**: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see [here](https://www.facebook.com/groups/j2team.community/permalink/1010834009248719/)). You can download the original dataset (**InfoRe Technology 1**) at [here](https://github.com/TensorSpeech/TensorFlowASR/blob/main/README.md#vietnamese).
39
-
40
- See `notebooks/denoise_infore_dataset.ipynb` for instructions on how to denoise the dataset. We use the Montreal Forced Aligner (MFA) to align transcript and speech (textgrid files).
41
- See `notebooks/align_text_audio_infore_mfa.ipynb` for instructions on how to create textgrid files.
42
-
43
- Train duration model
44
- --------------------
45
-
46
- ```sh
47
- python -m vietTTS.nat.duration_trainer
48
- ```
49
-
50
-
51
- Train acoustic model
52
- --------------------
53
- ```sh
54
- python -m vietTTS.nat.acoustic_trainer
55
- ```
56
-
57
-
58
-
59
- Train HiFiGAN vocoder
60
- -------------
61
-
62
- We use the original implementation from HiFiGAN authors at https://github.com/jik876/hifi-gan. Use the config file at `assets/hifigan/config.json` to train your model.
63
-
64
- ```sh
65
- git clone https://github.com/jik876/hifi-gan.git
66
-
67
- # create dataset in hifi-gan format
68
- ln -sf `pwd`/train_data hifi-gan/data
69
- cd hifi-gan/data
70
- ls -1 *.TextGrid | sed -e 's/\.TextGrid$//' > files.txt
71
- cd ..
72
- head -n 100 data/files.txt > val_files.txt
73
- tail -n +101 data/files.txt > train_files.txt
74
- rm data/files.txt
75
-
76
- # training
77
- python train.py \
78
- --config ../assets/hifigan/config.json \
79
- --input_wavs_dir=data \
80
- --input_training_file=train_files.txt \
81
- --input_validation_file=val_files.txt
82
- ```
83
-
84
- Finetune on Ground-Truth Aligned melspectrograms:
85
- ```sh
86
- cd /path/to/vietTTS # go to vietTTS directory
87
- python -m vietTTS.nat.zero_silence_segments -o train_data # zero all [sil, sp, spn] segments
88
- python -m vietTTS.nat.gta -o /path/to/hifi-gan/ft_dataset # create gta melspectrograms at hifi-gan/ft_dataset directory
89
-
90
- # turn on finetune
91
- cd /path/to/hifi-gan
92
- python train.py \
93
- --fine_tuning True \
94
- --config ../assets/hifigan/config.json \
95
- --input_wavs_dir=data \
96
- --input_training_file=train_files.txt \
97
- --input_validation_file=val_files.txt
98
- ```
99
-
100
- Then, use the following command to convert pytorch model to haiku format:
101
- ```sh
102
- cd ..
103
- python -m vietTTS.hifigan.convert_torch_model_to_haiku \
104
- --config-file=assets/hifigan/config.json \
105
- --checkpoint-file=hifi-gan/cp_hifigan/g_[latest_checkpoint]
106
- ```
107
-
108
- Synthesize speech
109
- -----------------
110
-
111
- ```sh
112
- python -m vietTTS.synthesizer \
113
- --lexicon-file=train_data/lexicon.txt \
114
- --text="hôm qua em tới trường" \
115
- --output=clip.wav
116
- ```