childes_mlm_unmasking_context_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1913

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100000
  • training_steps: 400000

Training results

Training Loss Epoch Step Validation Loss
No log 3.5149 2000 5.5613
6.2537 7.0299 4000 5.4999
6.2537 10.5448 6000 5.4087
5.329 14.0598 8000 4.5056
5.329 17.5747 10000 3.6002
3.7571 21.0896 12000 3.1522
3.7571 24.6046 14000 2.8883
2.9513 28.1195 16000 2.7471
2.9513 31.6344 18000 2.6601
2.6047 35.1494 20000 2.5240
2.6047 38.6643 22000 2.4842
2.3993 42.1793 24000 2.3980
2.3993 45.6942 26000 2.3463
2.2551 49.2091 28000 2.3255
2.2551 52.7241 30000 2.3089
2.1506 56.2390 32000 2.2677
2.1506 59.7540 34000 2.2623
2.0777 63.2689 36000 2.2329
2.0777 66.7838 38000 2.2055
2.0179 70.2988 40000 2.2353
2.0179 73.8137 42000 2.1910
1.9801 77.3286 44000 2.2011
1.9801 80.8436 46000 2.1847
1.9489 84.3585 48000 2.1734
1.9489 87.8735 50000 2.1883
1.9218 91.3884 52000 2.1535
1.9218 94.9033 54000 2.1826
1.9024 98.4183 56000 2.1612
1.9024 101.9332 58000 2.1442
1.8923 105.4482 60000 2.1944
1.8923 108.9631 62000 2.1725
1.8807 112.4780 64000 2.1913

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
3
Safetensors
Model size
10.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.