projecte-aina
/

alvocat-vocos-22khz

PyTorch

ONNX

vocoder

vocos

tts

Model card Files Files and versions Community

Baybars commited on Apr 19, 2024

Commit

afa14bb

verified ·

1 Parent(s): efa5d56

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -8

README.md CHANGED Viewed

@@ -1,15 +1,20 @@
 ---
-license: mit
 datasets:
 - projecte-aina/festcat_trimmed_denoised
 - projecte-aina/openslr-slr69-ca-trimmed-denoised
 ---
-# Vocos-mel-22khz-cat
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -75,7 +80,7 @@ y_hat = vocos(y)
 ### Onnx
-We also release a onnx version of the model, you can check in colab:
 <a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
@@ -93,14 +98,14 @@ The model was trained on 3 Catalan speech datasets
 |---------------------|----------|---------|
 | Festcat             | ca       | 22      |
 | OpenSLR69           | ca       | 5       |
-| lafresca            | ca       | 3.5     |
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-The model was trained for 1.5M steps and 1.3k epochs with a batch size of 16 for stability. We used a Cosine scheduler with a initial learning rate of 5e-4.
 We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
@@ -156,8 +161,14 @@ For further information, please send an email to <[email protected]>.
 Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
 ### License
-[MIT](https://opensource.org/license/mit)
 ### Funding
 This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).

 ---
+license: cc-by-nc-4.0
 datasets:
 - projecte-aina/festcat_trimmed_denoised
 - projecte-aina/openslr-slr69-ca-trimmed-denoised
+tags:
+- vocoder
+- vocos
+- tts
 ---
+# 🥑 alVoCat
 <!-- Provide a quick summary of what the model is/does. -->
+🥑 alVoCat is a vocoder for Catalan TTS, based on Vocos architecture. It is highly performant and
+high quality, works together with [🍵 Matxa](https://huggingface.co/BSC-LT/matcha-tts-cat-multiaccent)
+and you can find a demo [here](https://huggingface.co/spaces/BSC-LT/matchatts-vocos-onnx-ca).
 ## Model Details
 ### Onnx
+We also release an onnx version of the model, you can check in colab:
 <a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 |---------------------|----------|---------|
 | Festcat             | ca       | 22      |
 | OpenSLR69           | ca       | 5       |
+| LaFrescat           | ca       | 3.5     |
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+The model was trained for 1.5M steps and 1.3k epochs with a batch size of 16 for stability. We used a Cosine scheduler with an initial learning rate of 5e-4.
 We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
 Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
 ### License
+[Creative Commons Attribution Non-commercial 4.0](https://www.creativecommons.org/licenses/by-nc/4.0/)
+These models are free to use for non-commercial and research purposes. Commercial use is only possible through licensing by
+the voice artists. For further information, contact <[email protected]> and <[email protected]>.
 ### Funding
 This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
+Part of the training of the model was possible thanks to the compute time given by Galician Supercomputing Center CESGA
+([Centro de Supercomputación de Galicia](https://www.cesga.es/))