Fairseq
PyTorch
Safetensors
hubert

HUTTER-12: H(uBERT) UTTER model covering 12 languages.

  • Total training hours: 1,622 from Romance (French: 300h, Spanish: 300h, Portuguese: 102.3h), West-Germanic (Danish: 3.5h, German: 300h, Dutch: 72.1h, Frisian: 41.2h) and other languages (Chinese (zh-CN): 104.6h, Japanese: 37h, Arabic: 61h, Swahili 300h, Guaraní: 0.4h)
  • Number of updates: 400K
  • Number of iterations: 3
  • Clustering approach: mini-batch K-means (100% of the data)
  • Dataset: CommonVoice v13

Funding

This is an output of the European Project UTTER (Unified Transcription and Translation for Extended Reality) under grant number 101070631. For more information go to https://he-utter.eu/
Downloads last month
25
Safetensors
Model size
94.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Dataset used to train utter-project/hutter-12-3rd-base