--- language: en tags: - text-to-speech - StyleTTS2 - speech-synthesis license: mit pipeline_tag: text-to-speech --- # StyleTTS2 Fine-tuned Model This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference. ## Model Details - **Base Model:** StyleTTS2-LibriTTS - **Architecture:** StyleTTS2 - **Task:** Text-to-Speech - **Last Checkpoint:** epoch_2nd_00004.pth ## Training Details - **Total Epochs:** 5 - **Completed Epochs:** 4 - **Total Iterations:** 2010 - **Batch Size:** 2 - **Max Length:** 650 - **Learning Rate:** 0.0001 - **Final Validation Loss:** 0.444865 ## Model Components The repository includes all necessary components for inference: ### Main Model Components: - bert.pth - bert_encoder.pth - predictor.pth - decoder.pth - text_encoder.pth - predictor_encoder.pth - style_encoder.pth - diffusion.pth - text_aligner.pth - pitch_extractor.pth - mpd.pth - msd.pth - wd.pth ### Utility Components: - ASR (Automatic Speech Recognition) - epoch_00080.pth - config.yml - models.py - layers.py - JDC (F0 Prediction) - bst.t7 - model.py - PLBERT - step_1000000.t7 - config.yml - util.py ### Additional Files: - text_utils.py: Text preprocessing utilities - models.py: Model architecture definitions - utils.py: Utility functions - config.yml: Model configuration - config.json: Detailed configuration and training metrics ## Training Metrics Training metrics visualization is available in training_metrics.png ## Directory Structure ├── Utils/ │ ├── ASR/ │ ├── JDC/ │ └── PLBERT/ ├── model_components/ └── configs/ ## Usage Instructions 1. Load the model using the provided config.yml 2. Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories 3. Use text_utils.py for text preprocessing 4. Follow the inference example in the StyleTTS2 documentation