How can I get the timing of each word?
#141
by
alissonlauffer
- opened
I found this model, and it's nice what you've done with only 82M parameters.
I'd really like to get a report of the timing of each word so that I can create real-time subtitles of the text (in a word by word basis). I tried using split_pattern=" "
, but the resulting audios were a bit off as the sentence context was lost.
Timestamps were merged in https://github.com/hexgrad/kokoro/pull/46 and if you access the tokens
in the result, there should be a start_ts
and end_ts
per MToken
: https://github.com/hexgrad/misaki/blob/d4289a30d992ce7ca9da93524ff6cefd3d62adb8/misaki/en.py#L27-L28
hexgrad
changed discussion status to
closed
I just tried it here, and it seems that tokens
is just None
for languages other than English :(