ESPnet2 ASR model
espnet/shihlun_asr_whisper_medium_finetuned_chime4
This model was trained by Shih-Lun Wu (slseanwu) using the chime4 recipe in espnet.
Demo: How to use in ESPnet2
cd espnet
pip install -e .
cd egs2/chime4/asr1
train_set=tr05_multi_noisy_si284
valid_set=dt05_multi_isolated_1ch_track
test_sets="dt05_real_isolated_1ch_track dt05_simu_isolated_1ch_track et05_real_isolated_1ch_track et05_simu_isolated_1ch_track"
asr_tag=whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs
asr_config=conf/tuning/train_asr_whisper_full.yaml
inference_config=conf/decode_asr_whisper_noctc_greedy.yaml
./asr.sh \
--skip_data_prep false \
--skip_train true \
--skip_eval false \
--lang en \
--ngpu 1 \
--nj 4 \
--stage 1 \
--stop_stage 13 \
--gpu_inference true \
--inference_nj 1 \
--token_type whisper_multilingual \
--feats_normalize '' \
--max_wav_duration 30 \
--feats_type raw \
--use_lm false \
--cleaner whisper_en \
--asr_tag "${asr_tag}" \
--asr_config "${asr_config}" \
--inference_config "${inference_config}" \
--inference_asr_model valid.acc.ave.pth \
--train_set "${train_set}" \
--valid_set "${valid_set}" \
--test_sets "${test_sets}" "$@"
RESULTS
Environments
- date:
Tue Jan 10 04:15:30 CST 2023
- python version:
3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]
- espnet version:
espnet 202211
- pytorch version:
pytorch 1.12.1
- Git hash:
d89be931dcc8f61437ac49cbe39a773f2054c50c
- Commit date:
Mon Jan 9 11:06:45 2023 -0600
asr_whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs
WER
dataset |
Snt |
Wrd |
Corr |
Sub |
Del |
Ins |
Err |
S.Err |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track |
1640 |
24791 |
97.8 |
1.7 |
0.5 |
0.3 |
2.5 |
24.5 |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track |
1640 |
24792 |
96.1 |
3.0 |
0.9 |
0.5 |
4.4 |
35.6 |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_real_isolated_1ch_track |
1320 |
19341 |
96.4 |
2.9 |
0.7 |
0.5 |
4.1 |
33.0 |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track |
1320 |
19344 |
93.4 |
5.0 |
1.7 |
0.8 |
7.4 |
41.8 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track |
1640 |
24791 |
97.7 |
1.8 |
0.5 |
0.4 |
2.8 |
25.5 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track |
1640 |
24792 |
96.0 |
3.3 |
0.8 |
0.7 |
4.8 |
36.0 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track |
1320 |
19341 |
96.1 |
3.3 |
0.6 |
0.7 |
4.6 |
34.9 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track |
1320 |
19344 |
92.9 |
5.8 |
1.3 |
1.2 |
8.3 |
43.2 |
CER
dataset |
Snt |
Wrd |
Corr |
Sub |
Del |
Ins |
Err |
S.Err |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track |
1640 |
141889 |
99.1 |
0.3 |
0.5 |
0.3 |
1.2 |
24.5 |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track |
1640 |
141900 |
98.2 |
0.8 |
1.0 |
0.5 |
2.3 |
35.6 |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_real_isolated_1ch_track |
1320 |
110558 |
98.5 |
0.7 |
0.8 |
0.5 |
1.9 |
33.0 |
decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track |
1320 |
110572 |
96.5 |
1.6 |
1.9 |
0.8 |
4.3 |
41.8 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track |
1640 |
141889 |
99.1 |
0.4 |
0.5 |
0.5 |
1.3 |
25.5 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track |
1640 |
141900 |
98.2 |
0.9 |
0.9 |
0.6 |
2.4 |
36.0 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track |
1320 |
110558 |
98.4 |
0.9 |
0.7 |
0.6 |
2.2 |
34.9 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track |
1320 |
110572 |
96.3 |
2.0 |
1.7 |
1.2 |
4.9 |
43.2 |