Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,56 @@
|
|
1 |
-
---
|
2 |
-
license: cc0-1.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc0-1.0
|
3 |
+
language:
|
4 |
+
- es
|
5 |
+
base_model:
|
6 |
+
- SWivid/F5-TTS
|
7 |
+
---
|
8 |
+
|
9 |
+
# F5-TTS Spanish Language Model
|
10 |
+
|
11 |
+
## Overview
|
12 |
+
The F5-TTS model is finetuned specifically for Spanish language speech synthesis, with an emphasis on several regional dialects from Latin America. This project aims to deliver high-quality, regionally diverse speech synthesis capabilities for Spanish speakers by training on a variety of Latin American Spanish datasets.
|
13 |
+
|
14 |
+
## License
|
15 |
+
This model is released under the CC0-1.0 license, which allows for free usage, modification, and distribution.
|
16 |
+
|
17 |
+
## Datasets
|
18 |
+
The following datasets were used for training:
|
19 |
+
|
20 |
+
- [Voxpopuli Dataset](https://huggingface.co/datasets/facebook/voxpopuli)
|
21 |
+
- Crowdsourced high-quality Spanish speech data:
|
22 |
+
- Chilean Spanish
|
23 |
+
- Colombian Spanish
|
24 |
+
- Peruvian Spanish
|
25 |
+
- Puerto Rican Spanish
|
26 |
+
- Venezuelan Spanish
|
27 |
+
|
28 |
+
Additional sources:
|
29 |
+
- [Crowdsourced high-quality Chilean Spanish speech data set](https://www.openslr.org/71/)
|
30 |
+
- [Crowdsourced high-quality Colombian Spanish speech data set](https://www.openslr.org/72/)
|
31 |
+
- [Crowdsourced high-quality Peruvian Spanish speech data set](https://www.openslr.org/73/)
|
32 |
+
- [Crowdsourced high-quality Puerto Rico Spanish speech data set](https://www.openslr.org/74/)
|
33 |
+
- [Crowdsourced high-quality Venezuelan Spanish speech data set](https://www.openslr.org/75/)
|
34 |
+
|
35 |
+
## Model Information
|
36 |
+
**Base Model:** SWivid/F5-TTS
|
37 |
+
**Total Training Duration:** 218 hours of audio
|
38 |
+
**Training Configuration:**
|
39 |
+
- Batch Size: 3200
|
40 |
+
- Max Samples: 64
|
41 |
+
- Training Steps: 1,200,000
|
42 |
+
|
43 |
+
## Usage Instructions
|
44 |
+
|
45 |
+
1. **Run the F5-TTS application** and monitor the terminal output. The path to the model file will be displayed, similar to the following:
|
46 |
+
```
|
47 |
+
model : C:\Users\thega\.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\995ff41929c08ff968786b448a384330438b5cb6\F5TTS_Base\model_1200000.safetensors
|
48 |
+
```
|
49 |
+
2. **Replace the Model File:**
|
50 |
+
- Navigate to the specified location.
|
51 |
+
- Rename the existing file to `model_1200000.safetensors.bak`.
|
52 |
+
- Download the `model_1200000.safetensors` file from this repository and place it in the same location.
|
53 |
+
3. **Rerun the application** to load the updated model.
|
54 |
+
|
55 |
+
## Contributions and Recommendations
|
56 |
+
This model may benefit from further fine-tuning to enhance its performance across different Spanish dialects. Contributions from the community are encouraged. For optimal output quality, preprocess the reference audio by removing background noise, balancing audio levels, and enhancing clarity.
|