metadata
tags:
- espnet
- audio
- automatic-speech-recognition
language:
- et
license: apache-2.0
metrics:
- wer
model-index:
- name: e-branchformer et
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: ERR2020
type: audio
metrics:
- name: Wer
type: wer
value: 9.9
e-branchformer et
Espnet2 e-branchformer based recipe (https://github.com/espnet/espnet/tree/master/egs2/librispeech_100/asr1) trained Estonian ASR model using ERR2020 dataset
- WER on ERR2020: 9.9
- WER on mozilla commonvoice_11: 20.8
For usage:
- clone this repo (
git clone https://huggingface.co/rristo/espnet_ebranchformer_et
) - go to repo (
cd espnet_ebranchformer_et
) - build docker image for needed libraries (
build.sh
orbuild.bat
) - run docker container (
run.sh
orrun.sh
). This mounts current directory - run notebook
example_usage.ipynb
for example usage- currently expects audio to be in .wav format
Model description
ASR model for Estonian, uses Estonian Public Broadcasting data ERR2020 data (around 340 hours of audio)
Intended uses & limitations
Pretty much a toy model, trained on limited amount of data. Might not work well on data out of domain (especially spontaneous/noisy data).
Training and evaluation data
Trained on ERR2020 data, evaluated on ERR2020 and mozilla commonvoice test data.
Training procedure
Used espnet e-branchformer based recipe (https://github.com/espnet/espnet/tree/master/egs2/librispeech_100/asr1)
Training results
Look into folder exp/images.
Validation results are in exp/RESULTS.md
Framework versions
- espnet2