Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,20 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
datasets:
|
4 |
- projecte-aina/festcat_trimmed_denoised
|
5 |
- projecte-aina/openslr-slr69-ca-trimmed-denoised
|
|
|
|
|
|
|
|
|
6 |
---
|
7 |
|
8 |
-
#
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
-
|
12 |
-
|
|
|
13 |
|
14 |
## Model Details
|
15 |
|
@@ -75,7 +80,7 @@ y_hat = vocos(y)
|
|
75 |
|
76 |
### Onnx
|
77 |
|
78 |
-
We also release
|
79 |
|
80 |
<a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
|
81 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
@@ -93,14 +98,14 @@ The model was trained on 3 Catalan speech datasets
|
|
93 |
|---------------------|----------|---------|
|
94 |
| Festcat | ca | 22 |
|
95 |
| OpenSLR69 | ca | 5 |
|
96 |
-
|
|
97 |
|
98 |
|
99 |
|
100 |
### Training Procedure
|
101 |
|
102 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
103 |
-
The model was trained for 1.5M steps and 1.3k epochs with a batch size of 16 for stability. We used a Cosine scheduler with
|
104 |
We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
|
105 |
|
106 |
|
@@ -156,8 +161,14 @@ For further information, please send an email to <[email protected]>.
|
|
156 |
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
|
157 |
|
158 |
### License
|
159 |
-
[
|
|
|
|
|
|
|
160 |
|
161 |
### Funding
|
162 |
|
163 |
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
datasets:
|
4 |
- projecte-aina/festcat_trimmed_denoised
|
5 |
- projecte-aina/openslr-slr69-ca-trimmed-denoised
|
6 |
+
tags:
|
7 |
+
- vocoder
|
8 |
+
- vocos
|
9 |
+
- tts
|
10 |
---
|
11 |
|
12 |
+
# 馃 alVoCat
|
13 |
|
14 |
<!-- Provide a quick summary of what the model is/does. -->
|
15 |
+
馃 alVoCat is a vocoder for Catalan TTS, based on Vocos architecture. It is highly performant and
|
16 |
+
high quality, works together with [馃嵉 Matxa](https://huggingface.co/BSC-LT/matcha-tts-cat-multiaccent)
|
17 |
+
and you can find a demo [here](https://huggingface.co/spaces/BSC-LT/matchatts-vocos-onnx-ca).
|
18 |
|
19 |
## Model Details
|
20 |
|
|
|
80 |
|
81 |
### Onnx
|
82 |
|
83 |
+
We also release an onnx version of the model, you can check in colab:
|
84 |
|
85 |
<a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
|
86 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
|
|
98 |
|---------------------|----------|---------|
|
99 |
| Festcat | ca | 22 |
|
100 |
| OpenSLR69 | ca | 5 |
|
101 |
+
| LaFrescat | ca | 3.5 |
|
102 |
|
103 |
|
104 |
|
105 |
### Training Procedure
|
106 |
|
107 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
108 |
+
The model was trained for 1.5M steps and 1.3k epochs with a batch size of 16 for stability. We used a Cosine scheduler with an initial learning rate of 5e-4.
|
109 |
We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
|
110 |
|
111 |
|
|
|
161 |
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
|
162 |
|
163 |
### License
|
164 |
+
[Creative Commons Attribution Non-commercial 4.0](https://www.creativecommons.org/licenses/by-nc/4.0/)
|
165 |
+
|
166 |
+
These models are free to use for non-commercial and research purposes. Commercial use is only possible through licensing by
|
167 |
+
the voice artists. For further information, contact <[email protected]> and <[email protected]>.
|
168 |
|
169 |
### Funding
|
170 |
|
171 |
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
172 |
+
|
173 |
+
Part of the training of the model was possible thanks to the compute time given by Galician Supercomputing Center CESGA
|
174 |
+
([Centro de Supercomputaci贸n de Galicia](https://www.cesga.es/))
|