StyleTTS2 Fine-tuned Model

This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference.

Model Details

  • Base Model: StyleTTS2-LibriTTS
  • Architecture: StyleTTS2
  • Task: Text-to-Speech
  • Last Checkpoint: epoch_2nd_00003.pth

Training Details

  • Total Epochs: 4
  • Completed Epochs: 3
  • Total Iterations: 3928
  • Batch Size: 2
  • Max Length: 630
  • Learning Rate: 0.0001
  • Final Validation Loss: 0.458012

Model Components

The repository includes all necessary components for inference:

Main Model Components:

  • bert.pth
  • bert_encoder.pth
  • predictor.pth
  • decoder.pth
  • text_encoder.pth
  • predictor_encoder.pth
  • style_encoder.pth
  • diffusion.pth
  • text_aligner.pth
  • pitch_extractor.pth
  • mpd.pth
  • msd.pth
  • wd.pth

Utility Components:

  • ASR (Automatic Speech Recognition)
    • epoch_00080.pth
    • config.yml
    • models.py
    • layers.py
  • JDC (F0 Prediction)
    • bst.t7
    • model.py
  • PLBERT
    • step_1000000.t7
    • config.yml
    • util.py

Additional Files:

  • text_utils.py: Text preprocessing utilities
  • models.py: Model architecture definitions
  • utils.py: Utility functions
  • config.yml: Model configuration
  • config.json: Detailed configuration and training metrics

Training Metrics

Training metrics visualization is available in training_metrics.png

Directory Structure

β”œβ”€β”€ Utils/ β”‚ β”œβ”€β”€ ASR/ β”‚ β”œβ”€β”€ JDC/ β”‚ └── PLBERT/ β”œβ”€β”€ model_components/ └── configs/

Usage Instructions

  1. Load the model using the provided config.yml
  2. Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories
  3. Use text_utils.py for text preprocessing
  4. Follow the inference example in the StyleTTS2 documentation
Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.