Spaces:

AAOBA
/

Umamusume-DeBERTa-VITS2-TTS-JP

Runtime error

App Files Files Community

Umamusume-DeBERTa-VITS2-TTS-JP / info.md

AAOBA

first commit

9bd9742 over 1 year ago

preview code

raw

history blame

1.82 kB

	# Umamusume DeBERTA-VITS2 TTS

	👌 Currently, ONLY Japanese is supported. 👌

	💪 Based on [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2), this work tightly follows [Akito/umamusume_bert_vits2](https://huggingface.co/spaces/AkitoP/umamusume_bert_vits2), from which the Japanese text preprocessor is provided. ❤

	✋ Please do NOT enter a really LOOOONG sentence or sentences in a single row. Splitting your inputs into multiple rows makes each row to be inferenced separately. ✋

	✋ 请不要在一行内输入长文本，模型会将每行的输入视为一句话进行推理。请将多句话分别放入不同的行中来减少推理时间. ✋

	## Training Details - For those who may be interested

	🎈 This work switches [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) to [ku-nlp/deberta-v2-base-japanese](https://huggingface.co/ku-nlp/deberta-v2-base-japanese) expecting potentially better performance, and, just for fun. 🥰

	❤ Thanks to SUSTech Center for Computational Science and Engineering. ❤ This model is trained on A100 (40GB) x 2 with batch size 32 in total.

	💪 This model has been trained for 1 cycle, 90K steps (=60 epoch), currently. 💪

	📕 This work uses linear with warmup (7.5% of total steps) LR scheduler with ` max_lr=1e-4`. 📕

	✂ This work clips gradient value to 10 ✂.

	⚠ Finetuning the model on single-speaker datasets separately will definitely reach better result than training on a huge dataset comprising of many speakers. Sharing a same model leads to unexpected mixing of the speaker's voice line. ⚠

	### TODO:

	📅 Train one more cycle using text preprocessor provided by [AkitoP](https://huggingface.co/AkitoP) with better long tone processing capacity. 📅

	# Umamusume DeBERTA-VITS2 TTS

	👌 Currently, ONLY Japanese is supported. 👌

	💪 Based on [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2), this work tightly follows [Akito/umamusume_bert_vits2](https://huggingface.co/spaces/AkitoP/umamusume_bert_vits2), from which the Japanese text preprocessor is provided. ❤

	✋ Please do NOT enter a really LOOOONG sentence or sentences in a single row. Splitting your inputs into multiple rows makes each row to be inferenced separately. ✋

	✋ 请不要在一行内输入长文本，模型会将每行的输入视为一句话进行推理。请将多句话分别放入不同的行中来减少推理时间. ✋

	## Training Details - For those who may be interested

	🎈 This work switches [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) to [ku-nlp/deberta-v2-base-japanese](https://huggingface.co/ku-nlp/deberta-v2-base-japanese) expecting potentially better performance, and, just for fun. 🥰

	❤ Thanks to SUSTech Center for Computational Science and Engineering. ❤ This model is trained on A100 (40GB) x 2 with batch size 32 in total.

	💪 This model has been trained for 1 cycle, 90K steps (=60 epoch), currently. 💪

	📕 This work uses linear with warmup (7.5% of total steps) LR scheduler with ` max_lr=1e-4`. 📕

	✂ This work clips gradient value to 10 ✂.

	⚠ Finetuning the model on single-speaker datasets separately will definitely reach better result than training on a huge dataset comprising of many speakers. Sharing a same model leads to unexpected mixing of the speaker's voice line. ⚠

	### TODO:

	📅 Train one more cycle using text preprocessor provided by [AkitoP](https://huggingface.co/AkitoP) with better long tone processing capacity. 📅