wav2vec2-large-960h

This model is a fine-tuned version of facebook/wav2vec2-large-960h on an acc_dataset_v2" dataset (commit: a41c520).

It achieves the following results on the evaluation set:

  • Loss: 0.6286

  • Wer: 0.1538

Model description

This is a voice-2-text transcription model specialized for the acc dataset.

Training and evaluation data

Training was based on the training set in acc_dataset_v2 and evaluation based on the validation and test sets in the same dataset.

Training procedure

See the Jupyter notebook Finetuning-notebook-wav2vec2-large-960h-on-acc-data for the full training procedure.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 128
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
1.2379 8.3333 50 0.6501 0.3056
0.4865 16.6667 100 0.7069 0.2790
0.3054 25.0 150 0.6598 0.2369
0.2308 33.3333 200 0.6517 0.2215
0.1793 41.6667 250 0.6884 0.2103
0.1379 50.0 300 0.6418 0.1949
0.1253 58.3333 350 0.7004 0.1918
0.0988 66.6667 400 0.6059 0.1846
0.088 75.0 450 0.6507 0.1826
0.0773 83.3333 500 0.5473 0.1682
0.0686 91.6667 550 0.6027 0.1682
0.0643 100.0 600 0.6192 0.1713
0.0595 108.3333 650 0.6119 0.1703
0.0562 116.6667 700 0.5953 0.16
0.0507 125.0 750 0.6286 0.1538

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
79
Safetensors
Model size
315M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for monadical-labs/wav2vec2-large-960h

Finetuned
(7)
this model