Spaces:
Runtime error
Runtime error
File size: 1,815 Bytes
9bd9742 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# Umamusume DeBERTA-VITS2 TTS
π **Currently, ONLY Japanese is supported.** π
πͺ **Based on [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2), this work tightly follows [Akito/umamusume_bert_vits2](https://huggingface.co/spaces/AkitoP/umamusume_bert_vits2), from which the Japanese text preprocessor is provided.** β€
β **Please do NOT enter a really LOOOONG sentence or sentences in a single row. Splitting your inputs into multiple rows makes each row to be inferenced separately.** β
β **θ―·δΈθ¦ε¨δΈθ‘ε
θΎε
₯ιΏζζ¬οΌζ¨‘εδΌε°ζ―θ‘ηθΎε
₯θ§δΈΊδΈε₯θ―θΏθ‘ζ¨ηγθ―·ε°ε€ε₯θ―εε«ζΎε
₯δΈεηθ‘δΈζ₯εε°ζ¨ηζΆι΄.** β
## Training Details - For those who may be interested
π **This work switches [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) to [ku-nlp/deberta-v2-base-japanese](https://huggingface.co/ku-nlp/deberta-v2-base-japanese) expecting potentially better performance, and, just for fun.** π₯°
β€ Thanks to **SUSTech Center for Computational Science and Engineering**. β€ This model is trained on A100 (40GB) x 2 with **batch size 32** in total.
πͺ This model has been trained for **1 cycle, 90K steps (=60 epoch),** currently. πͺ
π This work uses linear with warmup (7.5% of total steps) LR scheduler with ` max_lr=1e-4`. π
β This work clips gradient value to 10 β.
β Finetuning the model on **single-speaker datasets separately** will definitely reach better result than training on a huge dataset comprising of many speakers. Sharing a same model leads to unexpected mixing of the speaker's voice line. β
### TODO:
π
Train one more cycle using text preprocessor provided by [AkitoP](https://huggingface.co/AkitoP) with better long tone processing capacity. π
|