Enhanced Multilingual Code-Switched Speech Recognition for Low-Resource Languages Using Transformer-Based Models and Dynamic Switching Algorithms

Model description

This model is designed to handle code-switched speech in Hindi and Marathi using the wav2vec2-large-xls-r-300m transformer-based model. It leverages advanced techniques such as Q-Learning, SARSA, and Deep Q-Networks (DQN) for determining optimal switch points in code-switched speech.

Intended uses & limitations

Intended uses

  • Automatic speech recognition for multilingual environments involving Hindi and Marathi.
  • Research in multilingual ASR and code-switching phenomena.

Limitations

  • The model may exhibit biases inherent in the training data.
  • Potential limitations in accurately recognizing heavily accented or dialectal speech not covered in the training dataset.

Training params and experimental info

The model was fine-tuned using the following parameters:

  • Attention Dropout: 0.1
  • Hidden Dropout: 0.1
  • Feature Projection Dropout: 0.1
  • Layerdrop: 0.1
  • Learning Rate: 3e-4
  • Mask Time Probability: 0.05

Training dataset

The model was trained on the Common Voice dataset, which includes diverse speech samples in both Hindi and Marathi. The dataset was augmented with synthetically generated code-switched speech to improve the model's robustness in handling code-switching scenarios.

Evaluation results

The model achieved the following performance metrics on the test set:

  • Word Error Rate (WER): 0.2800
  • Character Error Rate (CER): 0.2400
Downloads last month
82
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train Hemantrao/wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experimentx1

Evaluation results