progen2_cross_attention_only_h

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4917
  • Perplexity: 12.0823

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 5000

Training results

Training Loss Epoch Step Validation Loss Perplexity
32.5486 0.2909 100 7.6093 2016.7766
22.5738 0.5818 200 2.8788 17.7926
11.4858 0.8727 300 2.8572 17.4123
11.391 1.1658 400 2.8481 17.2545
11.6307 1.4567 500 2.6227 13.7734
10.2311 1.7476 600 2.4862 12.0155
9.9477 2.0407 700 2.4658 11.7733
9.8694 2.3316 800 2.6730 14.4827
9.8291 2.6225 900 2.4811 11.9541
31.1466 2.9135 1000 8.7851 6536.0332
34.9023 3.2065 1100 7.7230 2259.8149
30.5868 3.4975 1200 7.5959 1990.0344
30.4004 3.7884 1300 7.5865 1971.3219
31.7038 4.0815 1400 8.0208 3043.6248
31.3893 4.3724 1500 7.2647 1428.9806
25.8028 4.6633 1600 5.7546 315.6425
22.4188 4.9542 1700 5.3616 213.0554
21.249 5.2473 1800 5.3029 200.9226
20.9864 5.5382 1900 5.3000 200.3277
20.9816 5.8291 2000 5.1496 172.3635
20.6328 6.1222 2100 4.6971 109.6314
18.4146 6.4131 2200 4.5423 93.9023
17.0501 6.704 2300 3.8270 45.9244
15.666 6.9949 2400 3.4366 31.0810
15.927 7.288 2500 3.9706 53.0142
13.5433 7.5789 2600 2.9892 19.8694
12.3278 7.8698 2700 3.1080 22.3761
12.0588 8.1629 2800 2.7287 15.3123
11.1222 8.4538 2900 2.6745 14.5055
10.9132 8.7447 3000 2.6467 14.1074
10.9437 9.0378 3100 2.6341 13.9301
10.8436 9.3287 3200 3.8787 48.3626
10.6462 9.6196 3300 2.6104 13.6050
10.5014 9.9105 3400 2.6434 14.0614
10.4753 10.2036 3500 2.6008 13.4750
10.4235 10.4945 3600 2.5825 13.2301
10.2556 10.7855 3700 2.5495 12.8001
10.2415 11.0785 3800 2.5396 12.6741
10.1531 11.3695 3900 2.5290 12.5413
10.1279 11.6604 4000 2.5270 12.5158
10.0816 11.9513 4100 2.5152 12.3687
10.0384 12.2444 4200 2.5198 12.4260
10.0156 12.5353 4300 2.5003 12.1862
9.9928 12.8262 4400 2.4984 12.1632
10.0172 13.1193 4500 2.4940 12.1100
9.9678 13.4102 4600 2.4955 12.1281
9.9605 13.7011 4700 2.4927 12.0943
9.9324 13.992 4800 2.4920 12.0851
9.9536 14.2851 4900 2.4916 12.0804
9.9154 14.576 5000 2.4917 12.0823

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.1.0.post301
  • Datasets 3.0.2
  • Tokenizers 0.21.0
Downloads last month
8
Safetensors
Model size
1.02B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.