StyleTTS2 Fine-tuned Model

This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference.

Model Details

  • Base Model: StyleTTS2-LibriTTS
  • Architecture: StyleTTS2
  • Task: Text-to-Speech
  • Last Checkpoint: epoch_2nd_00004.pth

Training Details

  • Total Epochs: 5
  • Completed Epochs: 4
  • Total Iterations: 1695
  • Batch Size: 2
  • Max Length: 650
  • Learning Rate: 0.0001
  • Final Validation Loss: 0.378118

Model Components

The repository includes all necessary components for inference:

Main Model Components:

  • bert.pth
  • bert_encoder.pth
  • predictor.pth
  • decoder.pth
  • text_encoder.pth
  • predictor_encoder.pth
  • style_encoder.pth
  • diffusion.pth
  • text_aligner.pth
  • pitch_extractor.pth
  • mpd.pth
  • msd.pth
  • wd.pth

Utility Components:

  • ASR (Automatic Speech Recognition)
    • epoch_00080.pth
    • config.yml
    • models.py
    • layers.py
  • JDC (F0 Prediction)
    • bst.t7
    • model.py
  • PLBERT
    • step_1000000.t7
    • config.yml
    • util.py

Additional Files:

  • text_utils.py: Text preprocessing utilities
  • models.py: Model architecture definitions
  • utils.py: Utility functions
  • config.yml: Model configuration
  • config.json: Detailed configuration and training metrics

Training Metrics

Training metrics visualization is available in training_metrics.png

Directory Structure

β”œβ”€β”€ Utils/ β”‚ β”œβ”€β”€ ASR/ β”‚ β”œβ”€β”€ JDC/ β”‚ └── PLBERT/ β”œβ”€β”€ model_components/ └── configs/

Usage Instructions

  1. Load the model using the provided config.yml
  2. Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories
  3. Use text_utils.py for text preprocessing
  4. Follow the inference example in the StyleTTS2 documentation
Downloads last month
3
Inference Examples
Unable to determine this model's library. Check the docs .