Enhanced Multilingual Code-Switched Speech Recognition for Low-Resource Languages Using Transformer-Based Models and Dynamic Switching Algorithms
Model description
This model is designed to handle code-switched speech in Hindi and Marathi using the wav2vec2-large-xls-r-300m transformer-based model. It leverages advanced techniques such as Q-Learning, SARSA, and Deep Q-Networks (DQN) for determining optimal switch points in code-switched speech.
Intended uses & limitations
Intended uses
- Automatic speech recognition for multilingual environments involving Hindi and Marathi.
- Research in multilingual ASR and code-switching phenomena.
Limitations
- The model may exhibit biases inherent in the training data.
- Potential limitations in accurately recognizing heavily accented or dialectal speech not covered in the training dataset.
Training params and experimental info
The model was fine-tuned using the following parameters:
- Attention Dropout: 0.1
- Hidden Dropout: 0.1
- Feature Projection Dropout: 0.1
- Layerdrop: 0.1
- Learning Rate: 3e-4
- Mask Time Probability: 0.05
Training dataset
The model was trained on the Common Voice dataset, which includes diverse speech samples in both Hindi and Marathi. The dataset was augmented with synthetically generated code-switched speech to improve the model's robustness in handling code-switching scenarios.
Evaluation results
The model achieved the following performance metrics on the test set:
- Word Error Rate (WER): 0.2800
- Character Error Rate (CER): 0.2400
- Downloads last month
- 82
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Dataset used to train Hemantrao/wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experimentx1
Evaluation results
- Word Error Rate (WER) on common_voiceInternal Evaluation0.280
- Character Error Rate (CER) on common_voiceInternal Evaluation0.240