How can I get the timing of each word?

#141

by alissonlauffer - opened 2 days ago

2 days ago

I found this model, and it's nice what you've done with only 82M parameters.

I'd really like to get a report of the timing of each word so that I can create real-time subtitles of the text (in a word by word basis). I tried using split_pattern=" ", but the resulting audios were a bit off as the sentence context was lost.

hexgrad

Owner 2 days ago

Timestamps were merged in https://github.com/hexgrad/kokoro/pull/46 and if you access the tokens in the result, there should be a start_ts and end_ts per MToken: https://github.com/hexgrad/misaki/blob/d4289a30d992ce7ca9da93524ff6cefd3d62adb8/misaki/en.py#L27-L28

hexgrad changed discussion status to closed 2 days ago

alissonlauffer

1 day ago

I just tried it here, and it seems that tokens is just None for languages other than English :(

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment