distilhubert-finetuned-gtzan
This model is a fine-tuned version of ntu-spml/distilhubert on the GTZAN dataset. It achieves the following results on the evaluation set on best epoch:
- Loss: 0.7305
- Accuracy: 0.9
Model description
Distilhubert is distilled version of the HuBERT and pretrained on data set with 16k frequency.
Architecture of this model is CTC or Connectionist Temporal Classification is a technique that is used with encoder-only transformer.
Training and evaluation data
Training + Evaluation data set is GTZAN which is a popular dataset of 999 songs for music genre classification.
Each song is a 30-second clip from one of 10 genres of music, spanning disco to metal.
Train set is 899 songs and Evaluation set is 100 songs remainings.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 35
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.1728 | 1.0 | 225 | 2.0896 | 0.42 |
1.4211 | 2.0 | 450 | 1.4951 | 0.55 |
1.2155 | 3.0 | 675 | 1.0669 | 0.72 |
1.0175 | 4.0 | 900 | 0.8862 | 0.69 |
0.3516 | 5.0 | 1125 | 0.6265 | 0.83 |
0.6135 | 6.0 | 1350 | 0.6485 | 0.78 |
0.0807 | 7.0 | 1575 | 0.6567 | 0.78 |
0.0303 | 8.0 | 1800 | 0.7615 | 0.83 |
0.2663 | 9.0 | 2025 | 0.6612 | 0.86 |
0.0026 | 10.0 | 2250 | 0.8354 | 0.85 |
0.0337 | 11.0 | 2475 | 0.6768 | 0.87 |
0.0013 | 12.0 | 2700 | 0.7718 | 0.87 |
0.001 | 13.0 | 2925 | 0.7570 | 0.88 |
0.0008 | 14.0 | 3150 | 0.8170 | 0.89 |
0.0006 | 15.0 | 3375 | 0.7920 | 0.89 |
0.0005 | 16.0 | 3600 | 0.9859 | 0.83 |
0.0004 | 17.0 | 3825 | 0.8190 | 0.9 |
0.0003 | 18.0 | 4050 | 0.7305 | 0.9 |
0.0003 | 19.0 | 4275 | 0.8025 | 0.88 |
0.0002 | 20.0 | 4500 | 0.8208 | 0.87 |
0.0003 | 21.0 | 4725 | 0.7358 | 0.88 |
0.0002 | 22.0 | 4950 | 0.8681 | 0.87 |
0.0002 | 23.0 | 5175 | 0.7831 | 0.9 |
0.0003 | 24.0 | 5400 | 0.8583 | 0.88 |
0.0002 | 25.0 | 5625 | 0.8138 | 0.88 |
0.0002 | 26.0 | 5850 | 0.7871 | 0.89 |
0.0002 | 27.0 | 6075 | 0.8893 | 0.88 |
0.0002 | 28.0 | 6300 | 0.8284 | 0.89 |
0.0001 | 29.0 | 6525 | 0.8388 | 0.89 |
0.0001 | 30.0 | 6750 | 0.8305 | 0.9 |
0.0001 | 31.0 | 6975 | 0.8377 | 0.88 |
0.0153 | 32.0 | 7200 | 0.8496 | 0.88 |
0.0001 | 33.0 | 7425 | 0.8381 | 0.88 |
0.0001 | 34.0 | 7650 | 0.8440 | 0.88 |
0.0001 | 35.0 | 7875 | 0.8458 | 0.88 |
Framework versions
- Transformers 4.29.2
- Pytorch 1.13.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3
- Downloads last month
- 12
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.