jpgallegoar commited on
Commit
e9f4e06
·
verified ·
1 Parent(s): ce68299

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: cc0-1.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc0-1.0
3
+ language:
4
+ - es
5
+ base_model:
6
+ - SWivid/F5-TTS
7
+ ---
8
+
9
+ # F5-TTS Spanish Language Model
10
+
11
+ ## Overview
12
+ The F5-TTS model is finetuned specifically for Spanish language speech synthesis, with an emphasis on several regional dialects from Latin America. This project aims to deliver high-quality, regionally diverse speech synthesis capabilities for Spanish speakers by training on a variety of Latin American Spanish datasets.
13
+
14
+ ## License
15
+ This model is released under the CC0-1.0 license, which allows for free usage, modification, and distribution.
16
+
17
+ ## Datasets
18
+ The following datasets were used for training:
19
+
20
+ - [Voxpopuli Dataset](https://huggingface.co/datasets/facebook/voxpopuli)
21
+ - Crowdsourced high-quality Spanish speech data:
22
+ - Chilean Spanish
23
+ - Colombian Spanish
24
+ - Peruvian Spanish
25
+ - Puerto Rican Spanish
26
+ - Venezuelan Spanish
27
+
28
+ Additional sources:
29
+ - [Crowdsourced high-quality Chilean Spanish speech data set](https://www.openslr.org/71/)
30
+ - [Crowdsourced high-quality Colombian Spanish speech data set](https://www.openslr.org/72/)
31
+ - [Crowdsourced high-quality Peruvian Spanish speech data set](https://www.openslr.org/73/)
32
+ - [Crowdsourced high-quality Puerto Rico Spanish speech data set](https://www.openslr.org/74/)
33
+ - [Crowdsourced high-quality Venezuelan Spanish speech data set](https://www.openslr.org/75/)
34
+
35
+ ## Model Information
36
+ **Base Model:** SWivid/F5-TTS
37
+ **Total Training Duration:** 218 hours of audio
38
+ **Training Configuration:**
39
+ - Batch Size: 3200
40
+ - Max Samples: 64
41
+ - Training Steps: 1,200,000
42
+
43
+ ## Usage Instructions
44
+
45
+ 1. **Run the F5-TTS application** and monitor the terminal output. The path to the model file will be displayed, similar to the following:
46
+ ```
47
+ model : C:\Users\thega\.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\995ff41929c08ff968786b448a384330438b5cb6\F5TTS_Base\model_1200000.safetensors
48
+ ```
49
+ 2. **Replace the Model File:**
50
+ - Navigate to the specified location.
51
+ - Rename the existing file to `model_1200000.safetensors.bak`.
52
+ - Download the `model_1200000.safetensors` file from this repository and place it in the same location.
53
+ 3. **Rerun the application** to load the updated model.
54
+
55
+ ## Contributions and Recommendations
56
+ This model may benefit from further fine-tuning to enhance its performance across different Spanish dialects. Contributions from the community are encouraged. For optimal output quality, preprocess the reference audio by removing background noise, balancing audio levels, and enhancing clarity.