File size: 2,920 Bytes
e7797d1
 
 
 
 
 
 
 
 
 
 
 
 
 
9959545
 
375f8f6
6dbaabe
218c13c
 
 
9959545
988ed8c
9959545
375f8f6
 
 
 
 
 
 
 
f4a5d6d
988ed8c
 
 
 
 
9959545
 
218c13c
6dbaabe
 
9959545
6dbaabe
9959545
 
 
0661f67
9959545
6dbaabe
9959545
0661f67
763071e
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
license: cc-by-nc-sa-4.0
base_model: utter-project/mHuBERT-147
datasets:
- FBK-MT/Speech-MASSIVE
- FBK-MT/Speech-MASSIVE-test
- mozilla-foundation/common_voice_17_0
- google/fleurs
language:
- fr
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
---

**This is a small CTC-based Automatic Speech Recognition system for French.**

This model is part of the SLU demo available here: https://huggingface.co/spaces/naver/French-SLU-DEMO-Interspeech2024

Please check our blog post available at: TBD

* Training data: 123 hours (84,707 utterances)
* Normalization: Whisper normalization

# Table of Contents:
1. [Performance](https://huggingface.co/naver/mHuBERT-147-ASR-fr#performance)
2. [Training Parameters](https://huggingface.co/naver/mHuBERT-147-ASR-fr#training-parameters)
3. [ASR Model class](https://huggingface.co/naver/mHuBERT-147-ASR-fr#asr-model-class)
4. [Running inference](https://huggingface.co/naver/mHuBERT-147-ASR-fr#running-inference)

## Performance

|                    | **dev WER** | **dev CER** | **test WER** | **test CER** |
|:------------------:|:-----------:|:-----------:|:------------:|:------------:|
|  **speechMASSIVE** |     9.2     |     2.6     |      9.6     |      2.9     |
|    **fleurs102**   |     20.0    |     7.0     |     22.0     |      7.7     |
| **CommonVoice 17** |     16.0    |     4.9     |     19.0     |      6.5     |

## Training Parameters
This is a [mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) ASR fine-tuned model.
The training parameters are available in [config.yaml](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/config.yaml).
We highlight the use of 0.3 for hubert.final_dropout, which we found to be very helpful in convergence. We also use fp32 training, as we found fp16 training to be unstable.

## ASR Model Class

We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. 
The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. 
The code is available in [CTC_model.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/inference_code/CTC_model.py).

## Running Inference

The [run_inference.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/inference_code/run_inference.py) file illustrates how to load the model for inference (**load_asr_model**), and how to produce transcription for a file (**run_asr_inference**).
Please follow the [requirements file](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/requirements.txt) to avoid incorrect model loading.

Here is a simple example of the inference loop. Please notice that the sampling rate must be 16,000Hz.

```
from inference_code.run_inference import load_asr_model, run_asr_inference

model, processor = load_asr_model()

prediction = run_inference(model, processor, your_audio_file)

```