Respeecher/ukrainian-data2vec-asr
This model is a fine-tuned version of Respeecher/ukrainian-data2vec on the Common Voice 11.0 dataset Ukrainian Train part. It achieves the following results:
- eval_wer: 17.634350000973198
- test_wer: 17.042283338786351
How to Get Started with the Model
from transformers import AutoProcessor, Data2VecAudioForCTC
import torch
from datasets import load_dataset, Audio
dataset = load_dataset("mozilla-foundation/common_voice_11_0", "uk", split="test")
# Resample
dataset = dataset.cast_column("audio", Audio(sampling_rate=16_000))
processor = AutoProcessor.from_pretrained("Respeecher/ukrainian-data2vec-asr")
model = Data2VecAudioForCTC.from_pretrained("Respeecher/ukrainian-data2vec-asr")
model.eval()
sampling_rate = dataset.features["audio"].sampling_rate
inputs = processor(dataset[1]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
transcription[0]
Training Details
Training code and instructions are available on our github
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train Respeecher/ukrainian-data2vec-asr
Evaluation results
- Wer on Common Voice 11.0test set self-reported17.042
- Wer on Common Voice 11.0validation set self-reported17.634