Spaces:
Runtime error
Umamusume DeBERTA-VITS2 TTS
π Currently, ONLY Japanese is supported. π
πͺ Based on Bert-VITS2, this work tightly follows Akito/umamusume_bert_vits2, from which the Japanese text preprocessor is provided. β€
β Please do NOT enter a really LOOOONG sentence or sentences in a single row. Splitting your inputs into multiple rows makes each row to be inferenced separately. β
β θ―·δΈθ¦ε¨δΈθ‘ε θΎε ₯ιΏζζ¬οΌζ¨‘εδΌε°ζ―θ‘ηθΎε ₯θ§δΈΊδΈε₯θ―θΏθ‘ζ¨ηγθ―·ε°ε€ε₯θ―εε«ζΎε ₯δΈεηθ‘δΈζ₯εε°ζ¨ηζΆι΄. β
Training Details - For those who may be interested
π This work switches cl-tohoku/bert-base-japanese-v3 to ku-nlp/deberta-v2-base-japanese expecting potentially better performance, and, just for fun. π₯°
β€ Thanks to SUSTech Center for Computational Science and Engineering. β€ This model is trained on A100 (40GB) x 2 with batch size 32 in total.
πͺ This model has been trained for 1 cycle, 90K steps (=60 epoch), currently. πͺ
π This work uses linear with warmup (7.5% of total steps) LR scheduler with max_lr=1e-4
. π
β This work clips gradient value to 10 β.
β Finetuning the model on single-speaker datasets separately will definitely reach better result than training on a huge dataset comprising of many speakers. Sharing a same model leads to unexpected mixing of the speaker's voice line. β
TODO:
π Train one more cycle using text preprocessor provided by AkitoP with better long tone processing capacity. π