w2v-bert-2.0-nepali-transliterator

w2v-bert-2.0-nepali-transliterator is a speech-to-text transliteration model that converts spoken Nepali audio into Romanized Nepali text. It leverages wav2vec-based embeddings combined with BERT-style processing to enhance accuracy in phonetic transliteration.

This model is a fine-tuned version of facebook/w2v-bert-2.0 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2366
Wer: 0.2786

Model Details

Model Type: Speech-to-Text Transliteration Model
Language: Nepali (Audio to Romanized Nepali Text)
Dataset: Labeled Nepali speech dataset with Romanized text pairs
Base Architecture: wav2vec 2.0 + BERT
Task: Transliterating spoken Nepali into Romanized Nepali text
Use Case: Assisting non-Devanagari users in understanding Nepali speech through Romanized output

Direct Use

The model can be used to:

Convert Nepali speech into Romanized Nepali text
Assist non-Devanagari users in understanding spoken Nepali
Enable voice-based transliteration in chat applications

Out-of-Scope Use

Not for General Nepali Speech-to-Text – This model specifically transliterates into Roman Nepali instead of generating text in Devanagari script.
Not optimized for noisy environments – Performance may drop in low-quality or multi-speaker recordings.
May not handle code-switching well – If Nepali is mixed with English or other languages, accuracy might decrease.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 6
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.4387	1.3636	300	0.3972	0.5025
0.2712	2.7273	600	0.2779	0.3512
0.1335	4.0909	900	0.2366	0.2786

Framework versions

Transformers 4.49.0
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

AJNG
/

w2v-bert-2.0-nepali-transliterator