question on language coverage
I am wondering why for TTS why there's coverage for less popular language (like hakka in chinese) than much more popular language (like mandarin/cantonese in chinese).
Sounds unintuitive to me as it's much harder to get training data for less popular language.
Thanks!
Is it related to the tokenization of the language, i can see in hak, the vocab.txt is very minimal.
Hi @ydshieh , thanks for the reply.
If you look at https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html
And search "hak", you can see there is TTS support for Hakka language in Chinese.
But if you search “mandarin” or “yue”, you can see they have no TTS support.
If you check most spoken languages list, you can see yue and mandarin are much more popular than hak:
https://en.m.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
Thanks a lot! Indeed!
@vineelpratap Do you know why? I see you are the author of many commits in this repository so think you know the best.