wikipedia_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1731

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100000
  • training_steps: 400000

Training results

Training Loss Epoch Step Validation Loss
No log 2.3378 2000 6.9004
7.3814 4.6756 4000 6.9259
7.3814 7.0134 6000 6.9098
6.845 9.3513 8000 6.6689
6.845 11.6891 10000 5.7875
5.8652 14.0269 12000 5.1923
5.8652 16.3647 14000 4.8226
4.8065 18.7025 16000 4.5346
4.8065 21.0403 18000 4.3246
4.2309 23.3781 20000 4.1348
4.2309 25.7160 22000 3.9652
3.8185 28.0538 24000 3.8108
3.8185 30.3916 26000 3.7102
3.5163 32.7294 28000 3.6271
3.5163 35.0672 30000 3.5350
3.2957 37.4050 32000 3.5053
3.2957 39.7428 34000 3.4144
3.1388 42.0807 36000 3.3632
3.1388 44.4185 38000 3.3095
3.0197 46.7563 40000 3.3381
3.0197 49.0941 42000 3.3036
2.9398 51.4319 44000 3.2828
2.9398 53.7697 46000 3.2407
2.8775 56.1075 48000 3.2374
2.8775 58.4454 50000 3.2790
2.8378 60.7832 52000 3.1918
2.8378 63.1210 54000 3.1904
2.8089 65.4588 56000 3.1705
2.8089 67.7966 58000 3.1829
2.7826 70.1344 60000 3.2242
2.7826 72.4722 62000 3.1731

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
4
Safetensors
Model size
10.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.