PyTorch
ONNX
vocoder
vocos
tts
Baybars commited on
Commit
afa14bb
verified
1 Parent(s): efa5d56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -8
README.md CHANGED
@@ -1,15 +1,20 @@
1
  ---
2
- license: mit
3
  datasets:
4
  - projecte-aina/festcat_trimmed_denoised
5
  - projecte-aina/openslr-slr69-ca-trimmed-denoised
 
 
 
 
6
  ---
7
 
8
- # Vocos-mel-22khz-cat
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
-
12
-
 
13
 
14
  ## Model Details
15
 
@@ -75,7 +80,7 @@ y_hat = vocos(y)
75
 
76
  ### Onnx
77
 
78
- We also release a onnx version of the model, you can check in colab:
79
 
80
  <a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
81
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
@@ -93,14 +98,14 @@ The model was trained on 3 Catalan speech datasets
93
  |---------------------|----------|---------|
94
  | Festcat | ca | 22 |
95
  | OpenSLR69 | ca | 5 |
96
- | lafresca | ca | 3.5 |
97
 
98
 
99
 
100
  ### Training Procedure
101
 
102
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
103
- The model was trained for 1.5M steps and 1.3k epochs with a batch size of 16 for stability. We used a Cosine scheduler with a initial learning rate of 5e-4.
104
  We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
105
 
106
 
@@ -156,8 +161,14 @@ For further information, please send an email to <[email protected]>.
156
  Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
157
 
158
  ### License
159
- [MIT](https://opensource.org/license/mit)
 
 
 
160
 
161
  ### Funding
162
 
163
  This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
 
 
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
  datasets:
4
  - projecte-aina/festcat_trimmed_denoised
5
  - projecte-aina/openslr-slr69-ca-trimmed-denoised
6
+ tags:
7
+ - vocoder
8
+ - vocos
9
+ - tts
10
  ---
11
 
12
+ # 馃 alVoCat
13
 
14
  <!-- Provide a quick summary of what the model is/does. -->
15
+ 馃 alVoCat is a vocoder for Catalan TTS, based on Vocos architecture. It is highly performant and
16
+ high quality, works together with [馃嵉 Matxa](https://huggingface.co/BSC-LT/matcha-tts-cat-multiaccent)
17
+ and you can find a demo [here](https://huggingface.co/spaces/BSC-LT/matchatts-vocos-onnx-ca).
18
 
19
  ## Model Details
20
 
 
80
 
81
  ### Onnx
82
 
83
+ We also release an onnx version of the model, you can check in colab:
84
 
85
  <a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
86
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 
98
  |---------------------|----------|---------|
99
  | Festcat | ca | 22 |
100
  | OpenSLR69 | ca | 5 |
101
+ | LaFrescat | ca | 3.5 |
102
 
103
 
104
 
105
  ### Training Procedure
106
 
107
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
108
+ The model was trained for 1.5M steps and 1.3k epochs with a batch size of 16 for stability. We used a Cosine scheduler with an initial learning rate of 5e-4.
109
  We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
110
 
111
 
 
161
  Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
162
 
163
  ### License
164
+ [Creative Commons Attribution Non-commercial 4.0](https://www.creativecommons.org/licenses/by-nc/4.0/)
165
+
166
+ These models are free to use for non-commercial and research purposes. Commercial use is only possible through licensing by
167
+ the voice artists. For further information, contact <[email protected]> and <[email protected]>.
168
 
169
  ### Funding
170
 
171
  This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
172
+
173
+ Part of the training of the model was possible thanks to the compute time given by Galician Supercomputing Center CESGA
174
+ ([Centro de Supercomputaci贸n de Galicia](https://www.cesga.es/))