|
--- |
|
language: en |
|
tags: |
|
- text-to-speech |
|
- StyleTTS2 |
|
- speech-synthesis |
|
license: mit |
|
pipeline_tag: text-to-speech |
|
--- |
|
|
|
# StyleTTS2 Fine-tuned Model |
|
|
|
This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference. |
|
|
|
## Model Details |
|
- **Base Model:** StyleTTS2-LibriTTS |
|
- **Architecture:** StyleTTS2 |
|
- **Task:** Text-to-Speech |
|
- **Last Checkpoint:** epoch_2nd_00004.pth |
|
|
|
## Training Details |
|
- **Total Epochs:** 5 |
|
- **Completed Epochs:** 4 |
|
- **Total Iterations:** 2010 |
|
- **Batch Size:** 2 |
|
- **Max Length:** 650 |
|
- **Learning Rate:** 0.0001 |
|
- **Final Validation Loss:** 0.444865 |
|
|
|
## Model Components |
|
The repository includes all necessary components for inference: |
|
|
|
### Main Model Components: |
|
- bert.pth |
|
- bert_encoder.pth |
|
- predictor.pth |
|
- decoder.pth |
|
- text_encoder.pth |
|
- predictor_encoder.pth |
|
- style_encoder.pth |
|
- diffusion.pth |
|
- text_aligner.pth |
|
- pitch_extractor.pth |
|
- mpd.pth |
|
- msd.pth |
|
- wd.pth |
|
|
|
### Utility Components: |
|
- ASR (Automatic Speech Recognition) |
|
- epoch_00080.pth |
|
- config.yml |
|
- models.py |
|
- layers.py |
|
- JDC (F0 Prediction) |
|
- bst.t7 |
|
- model.py |
|
- PLBERT |
|
- step_1000000.t7 |
|
- config.yml |
|
- util.py |
|
|
|
### Additional Files: |
|
- text_utils.py: Text preprocessing utilities |
|
- models.py: Model architecture definitions |
|
- utils.py: Utility functions |
|
- config.yml: Model configuration |
|
- config.json: Detailed configuration and training metrics |
|
|
|
## Training Metrics |
|
Training metrics visualization is available in training_metrics.png |
|
|
|
## Directory Structure |
|
βββ Utils/ |
|
β βββ ASR/ |
|
β βββ JDC/ |
|
β βββ PLBERT/ |
|
βββ model_components/ |
|
βββ configs/ |
|
|
|
## Usage Instructions |
|
1. Load the model using the provided config.yml |
|
2. Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories |
|
3. Use text_utils.py for text preprocessing |
|
4. Follow the inference example in the StyleTTS2 documentation |
|
|