--- tags: - espnet - audio - speech-recognition - openai-whisper language: en datasets: - librispeech license: cc-by-4.0 --- ## ESPnet2 ASR model ### `espnet/shihlun_asr_whisper_medium_finetuned_librispeech100` This model was trained by Shih-Lun Wu (slseanwu) using the librispeech_100 recipe in [espnet](https://github.com/espnet/espnet/). ### Demo: How to use in ESPnet2 ```bash cd espnet pip install -e . cd egs2/librispeech_100/asr1 train_set="train_clean_100" valid_set="dev" test_sets="test_clean test_other dev_clean dev_other" asr_tag=whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs asr_config=conf/tuning/train_asr_whisper_full.yaml inference_config=conf/decode_asr_whisper_noctc_greedy.yaml ./asr.sh \ --skip_data_prep false \ --skip_train true \ --skip_eval false \ --lang en \ --ngpu 1 \ --nj 4 \ --stage 1 \ --stop_stage 13 \ --gpu_inference true \ --inference_nj 1 \ --token_type whisper_multilingual \ --feats_normalize '' \ --max_wav_duration 30 \ --speed_perturb_factors "0.9 1.0 1.1" \ --audio_format "flac.ark" \ --feats_type raw \ --use_lm false \ --cleaner whisper_en \ --asr_tag "${asr_tag}" \ --asr_config "${asr_config}" \ --inference_config "${inference_config}" \ --inference_asr_model valid.acc.ave.pth \ --train_set "${train_set}" \ --valid_set "${valid_set}" \ --test_sets "${test_sets}" "$@" ``` # RESULTS ## Environments - date: `Mon Jan 9 23:06:34 CST 2023` - python version: `3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]` - espnet version: `espnet 202211` - pytorch version: `pytorch 1.12.1` - Git hash: `d89be931dcc8f61437ac49cbe39a773f2054c50c` - Commit date: `Mon Jan 9 11:06:45 2023 -0600` ## asr_whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs ### WER |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_clean|2703|54798|97.7|1.9|0.3|0.3|2.6|30.1| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_other|2864|51528|95.3|4.3|0.4|0.6|5.3|45.4| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_clean|2620|53027|97.6|2.1|0.3|0.4|2.7|30.9| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_other|2939|52882|95.1|4.4|0.5|0.7|5.6|47.5| ### CER |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_clean|2703|287287|99.3|0.3|0.4|0.3|1.0|30.1| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_other|2864|265648|98.3|1.0|0.7|0.6|2.3|45.4| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_clean|2620|280691|99.3|0.3|0.3|0.3|1.0|30.9| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_other|2939|271738|98.3|1.0|0.7|0.7|2.4|47.5|