wav2vec2-xlsr-ft-cy-en
An acoustic encoder model for Welsh and English speech recognition, fine-tuned from facebook/wav2vec2-large-xlsr-53 using transcribed spontaneous speech from techiaith/banc-trawsgrifiadau-bangor (v24.01) as well as Welsh and English speech data derived from version 16.1 the Common Voice datasets techiaith/commonvoice_16_1_en_cy
Usage
The wav2vec2-xlsr-ft-cy-en model can be used directly as follows:
import torch
import torchaudio
import librosa
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")
model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")
audio, rate = librosa.load(audio_file, sr=16000)
inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)
with torch.no_grad():
tlogits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
# greedy decoding
predicted_ids = torch.argmax(logits, dim=-1)
print("Prediction:", processor.batch_decode(predicted_ids))
- Downloads last month
- 107
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.