--- license: apache-2.0 tags: - automatic-speech-recognition - fi - finnish library_name: transformers language: fi base_model: - GetmanY1/wav2vec2-xlarge-fi-150k model-index: - name: wav2vec2-xlarge-fi-150k-finetuned results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Lahjoita puhetta (Donate Speech) type: lahjoita-puhetta args: fi metrics: - name: Dev WER type: wer value: 14.98 - name: Dev CER type: cer value: 4.13 - name: Test WER type: wer value: 16.37 - name: Test CER type: cer value: 5.03 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Finnish Parliament type: FinParl args: fi metrics: - name: Dev16 WER type: wer value: 10.91 - name: Dev16 CER type: cer value: 4.85 - name: Test16 WER type: wer value: 7.81 - name: Test16 CER type: cer value: 3.48 - name: Test20 WER type: wer value: 6.43 - name: Test20 CER type: cer value: 2.09 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 16.1 type: mozilla-foundation/common_voice_16_1 args: fi metrics: - name: Dev WER type: wer value: 6.65 - name: Dev CER type: cer value: 1.15 - name: Test WER type: wer value: 5.42 - name: Test CER type: cer value: 0.96 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: FLEURS type: google/fleurs args: fi_fi metrics: - name: Dev WER type: wer value: 8.67 - name: Dev CER type: cer value: 5.18 - name: Test WER type: wer value: 9.96 - name: Test CER type: cer value: 5.74 --- # Finnish Wav2vec2-XLarge ASR [GetmanY1/wav2vec2-xlarge-fi-150k](https://huggingface.co/GetmanY1/wav2vec2-xlarge-fi-150k) fine-tuned on 4600 hours of Finnish speech on 16kHz sampled speech audio: * 1500 hours of [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3) (colloquial Finnish) * 3100 hours of the [Finnish Parliament dataset](https://link.springer.com/article/10.1007/s10579-023-09650-7) When using the model make sure that your speech input is also sampled at 16Khz. ## Model description The Finnish Wav2Vec2 X-Large has the same architecture and uses the same training objective as the multilingual one described in [paper](https://www.isca-archive.org/interspeech_2022/babu22_interspeech.pdf). [GetmanY1/wav2vec2-xlarge-fi-150k](https://huggingface.co/GetmanY1/wav2vec2-xlarge-fi-150k) is a large-scale, 1-billion parameter monolingual model pre-trained on 158k hours of unlabeled Finnish speech, including [KAVI radio and television archive materials](https://kavi.fi/en/radio-ja-televisioarkistointia-vuodesta-2008/), Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli. You can read more about the pre-trained model from [this paper](TODO). The training scripts are available on [GitHub](https://github.com/aalto-speech/large-scale-monolingual-speech-foundation-models). ## Intended uses You can use this model for Finnish ASR (speech-to-text). ### How to use To transcribe audio files the model can be used as a standalone acoustic model as follows: ``` from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC from datasets import load_dataset import torch # load model and processor processor = Wav2Vec2Processor.from_pretrained("GetmanY1/wav2vec2-xlarge-fi-150k-finetuned") model = Wav2Vec2ForCTC.from_pretrained("GetmanY1/wav2vec2-xlarge-fi-150k-finetuned") # load dummy dataset and read soundfiles ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test') # tokenize input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values # Batch size 1 # retrieve logits logits = model(input_values).logits # take argmax and decode predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids) ``` ## Team Members - Yaroslav Getman, [Hugging Face profile](https://huggingface.co/GetmanY1), [LinkedIn profile](https://www.linkedin.com/in/yaroslav-getman/) - Tamas Grosz, [Hugging Face profile](https://huggingface.co/Grosy), [LinkedIn profile](https://www.linkedin.com/in/tam%C3%A1s-gr%C3%B3sz-950a049a/) Feel free to contact us for more details 🤗