This is a DVAE filetune for xttsv2, based on the scripts presented here. https://github.com/daswer123/xtts-finetune-tests/tree/main/dvae-finetune
Trained on 100h of Russian high quality speech, potentially should improve finetune quality of GPT-2 and Perceiver models.
You can try to use it in xtts-finetune-webui as a custom DVAE
wandb: Run summary:
wandb: commit_loss 0.04019
wandb: cur_step 2571
wandb: epoch 19
wandb: loss 0.10499
wandb: recon_loss 0.06481