wav2vec-vm-finetune
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m for voicemail detection. It is trained on a dataset of call recordings to distinguish between voicemail greetings and live human responses.
Model description
This model builds on wav2vec2-xls-r-300m, a self-supervised speech model trained on large-scale multilingual data. We fine-tuned it on the first two seconds of a call.
Intended uses & limitations
Automated voicemail detection in AI-powered call assistants.
Filtering voicemail responses in customer service and sales call automation.
Only trianed on the English language.
Assumes the voicemail track is isolated and contains no audio from the caller.
Designed for the first two seconds of audio when calling a voicemail.
Training and evaluation data
The model was trained on a proprietary dataset of call recordings, labeled as:
- Live human responses
- Voicemail greetings
The dataset includes diverse voicemail recordings across multiple types to improve generalization.
Evaluation metrics
The model achieved:
- 98% accuracy on voicemail detection.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 10
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.48.2
- Pytorch 2.5.1+cu124
- Datasets 1.18.3
- Tokenizers 0.21.0
- Downloads last month
- 617
Model tree for jakeBland/wav2vec-vm-finetune
Base model
facebook/wav2vec2-xls-r-300m