File size: 3,449 Bytes
e9f4e06
120ddcf
a78d957
e9f4e06
 
 
 
 
 
f54f510
3b384c9
e9f4e06
 
 
08b41dd
e9f4e06
 
 
 
 
 
 
cde2e34
e9f4e06
05f531e
e9f4e06
 
 
 
 
9170f1f
 
e9f4e06
 
e35ce41
e9f4e06
 
 
 
 
9170f1f
e9f4e06
e35ce41
e9f4e06
 
 
 
 
 
 
 
 
 
4b9d863
 
48667b7
 
 
e9f4e06
 
 
 
48667b7
 
 
 
 
 
 
 
 
 
547c801
 
 
48667b7
e9f4e06
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: cc-by-nc-4.0
library_name: f5-tts
language:
- es
base_model:
- SWivid/F5-TTS
---

# [GitHub](https://github.com/jpgallegoar/Spanish-F5)

# F5-TTS Spanish Language Model

## Overview
The F5-TTS model is finetuned specifically for Spanish language speech synthesis. This project aims to deliver high-quality, regionally diverse speech synthesis capabilities for Spanish speakers.

## License
This model is released under the CC0-1.0 license, which allows for free usage, modification, and distribution.

## Datasets
The following datasets were used for training:

- [Voxpopuli Dataset](https://huggingface.co/datasets/facebook/voxpopuli), with mainly Peninsular Spain accents
- Crowdsourced high-quality Spanish speech data:
  - Argentinian Spanish
  - Chilean Spanish
  - Colombian Spanish
  - Peruvian Spanish
  - Puerto Rican Spanish
  - Venezuelan Spanish
- TEDx Spanish Corpus


Additional sources:
- [Crowdsourced high-quality Argentinian Spanish speech data set](https://www.openslr.org/61/)
- [Crowdsourced high-quality Chilean Spanish speech data set](https://www.openslr.org/71/)
- [Crowdsourced high-quality Colombian Spanish speech data set](https://www.openslr.org/72/)
- [Crowdsourced high-quality Peruvian Spanish speech data set](https://www.openslr.org/73/)
- [Crowdsourced high-quality Puerto Rico Spanish speech data set](https://www.openslr.org/74/)
- [Crowdsourced high-quality Venezuelan Spanish speech data set](https://www.openslr.org/75/)
- - [TEDx Spanish Corpus](https://www.openslr.org/67/)


## Model Information
**Base Model:** SWivid/F5-TTS  
**Total Training Duration:** 218 hours of audio  
**Training Configuration:**
- Batch Size: 3200
- Max Samples: 64
- Training Steps: 1,200,000

## Usage Instructions

### Method 0: HuggingFace space (https://huggingface.co/spaces/jpgallegoar/Spanish-F5)

### Method 1: Manual Model Replacement

1. **Run the F5-TTS Application:** Start the F5-TTS application and observe the terminal for output indicating the model file path. It should appear similar to:
   ```
   model : C:\Users\thega\.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\995ff41929c08ff968786b448a384330438b5cb6\F5TTS_Base\model_1200000.safetensors
   ```
2. **Replace the Model File:**
   - Navigate to the displayed file location.
   - Rename the existing model file to `model_1200000.safetensors.bak`.
   - Download `model_1200000.safetensors` from this repository and save it to the same location.

3. **Restart the Application:** Relaunch the F5-TTS application to load the updated model.

### Alternative Methods

- **GitHub Repository:** Clone the [Spanish-F5 repository](https://github.com/jpgallegoar/Spanish-F5/) and follow the provided installation instructions.
- **Google Colab:** Use the model via [Google Colab](https://colab.research.google.com/drive/1mm4NAlZVZq2_oL6ftijY64-PeEYwnqG1?usp=sharing).
  - Runtime -> Change Runtime Type -> T4 GPU
  - Runtime -> Run all
  - Click on the link shown in "Running on public URL: https://link.gradio.live" when it loads
- **Jupyter Notebook:** Run the model through the `Spanish_F5.ipynb` notebook.

## Contributions and Recommendations
This model may benefit from further fine-tuning to enhance its performance across different Spanish dialects. Contributions from the community are encouraged. For optimal output quality, preprocess the reference audio by removing background noise, balancing audio levels, and enhancing clarity.