childes_mlm_unmasking_sent_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1577

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100000
  • training_steps: 400000

Training results

Training Loss Epoch Step Validation Loss
No log 0.0741 2000 5.4637
6.1476 0.1481 4000 4.7186
6.1476 0.2222 6000 4.3520
4.4471 0.2963 8000 3.9931
4.4471 0.3703 10000 3.7630
3.8939 0.4444 12000 3.6285
3.8939 0.5185 14000 3.4888
3.6013 0.5926 16000 3.4438
3.6013 0.6666 18000 3.3501
3.4806 0.7407 20000 3.3621
3.4806 0.8148 22000 3.3173
3.4164 0.8888 24000 3.2997
3.4164 0.9629 26000 3.2795
3.3837 1.0370 28000 3.2958
3.3837 1.1110 30000 3.2695
3.3341 1.1851 32000 3.2437
3.3341 1.2592 34000 3.2345
3.332 1.3333 36000 3.2152
3.332 1.4073 38000 3.2175
3.3094 1.4814 40000 3.2130
3.3094 1.5555 42000 3.2210
3.3162 1.6295 44000 nan
3.3162 1.7036 46000 3.1529
3.3268 1.7777 48000 3.1837
3.3268 1.8517 50000 3.1856
3.3103 1.9258 52000 3.1568
3.3103 1.9999 54000 3.1903
3.2876 2.0740 56000 3.1443
3.2876 2.1480 58000 3.1751
3.3045 2.2221 60000 3.1543
3.3045 2.2962 62000 3.1577

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
3
Safetensors
Model size
10.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.