metadata

language: ja
tags:
  - speech
license: apache-2.0

distilhubert-ft-japanese-50k

Fine-tuned (more precisely, continue trained) 50k steps model on Japanese using the JVS corpus, Tsukuyomi-Chan corpus, Amitaro's ITA corpus V2.1, and recorded my own read ITA corpus.

Original repos, Many thanks!:
S3PRL

Using this when training (with little modify for train using own datasets).

distilhubert (hf)

Note: As same as the original, this model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model.

Usage

See this blog for more information on how to fine-tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by HubertForCTC.

Note: This is not the best checkpoint and become more accurate with continued train, I think. I'll try to continue when I have a time.

Credits

  ■つくよみちゃんコーパス（CV.夢前黎）

https://tyc.rei-yumesaki.net/material/corpus/

Amitaro's ITA corpus

あみたろの声素材工房

https://amitaro.net/

Thanks!