[STATUS] Jan 12 Forecast

#36
by hexgrad - opened

Jan 12: My intent is to supersede v0.19 with a better Kokoro model that dominates in every respect. To do this, I plan to continue training the unreleased v0.23 checkpoint on a richer data mix.

  • If successful, you should expect the next-gen Kokoro model to ship with more voices and languages, also under an Apache 2.0 license, with a similar 82M parameter architecture.
  • If unsuccessful, it would most likely be because the model does not converge, i.e. loss does not go down. That could be because of data quality issues, architecture limitations, overfitting on old data, underfitting on new data, etc. Rollbacks and model collapse are not unheard of in ML, but fingers crossed it does not happen here—or if they do, that I can address such issues should they come up.

Behind the scenes, slabs of data have been (and still are) coming in thanks to the outstanding community response to #21 and I am incredibly grateful for it. Some of these slabs are languages new to the model, which is exciting. Note that #21 is first-come-first-serve, and at some point I will not be able to airdrop your data into a GPU in the middle of a training run.

Most of my focus is now on organizing these slabs such that they can be dispatched to GPUs later. Training has not started yet, since data is still flowing in and much processing work remains. In the meantime, I may not be able to get to some of your questions, but please understand that is not without reason.

That's it for now, thanks everyone!

pedro.jpg

hexgrad pinned discussion

Keep up the amazing work! Kokoro is a godsend to those who have been waiting for a license-permissive high quality TTS model for so long.

This is inspiring work! You are singlehandedly changing the game. God bless and always follow your vision 🙏

Great to hear, question is there any chance that Koko will be able to eventually handle things like breath sounds, coughs, and those.... interupts... that normal speech have

not only is apache 2 a good call but im a huge fan of the 82m param size! Amazing work!!! ❤️❤️❤️

ps: could we have the updated discord server link, it says no longer working.

ps: could we have the updated discord server link, it says no longer working.

@shub1 The discord server link works, someone else had this issue earlier and said "Its a firefox problem refusing to open Discord" so maybe try another browser or switch to mobile.

Do you plan to include an emotion option. IE have the AI voice talk angrily or happily, etc.

Sign up or log in to comment