T5-mask-100-beam-3

This model is a fine-tuned version of mrm8488/t5-base-finetuned-common_gen on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6461
  • Bleu: 5.5434
  • Gen Len: 14.3534

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
2.256 1.0 527 2.2936 6.9789 13.1364
2.2149 2.0 1054 2.2944 7.0421 13.1366
2.1975 3.0 1581 2.3005 6.9363 13.2412
2.1766 4.0 2108 2.3055 6.8015 13.2558
2.1635 5.0 2635 2.3066 6.9031 13.2852
2.145 6.0 3162 2.3105 6.7477 13.4291
2.1322 7.0 3689 2.3164 6.9102 13.3454
2.1147 8.0 4216 2.3218 6.7552 13.4181
2.1079 9.0 4743 2.3247 6.8419 13.4602
2.0914 10.0 5270 2.3329 6.751 13.4266
2.0803 11.0 5797 2.3354 6.6713 13.5381
2.0675 12.0 6324 2.3379 6.7464 13.4975
2.0565 13.0 6851 2.3399 6.7349 13.5582
2.0459 14.0 7378 2.3443 6.7243 13.5358
2.0351 15.0 7905 2.3470 6.7024 13.6242
2.0246 16.0 8432 2.3563 6.6921 13.5607
2.016 17.0 8959 2.3528 6.7559 13.6692
2.0053 18.0 9486 2.3603 6.8006 13.5881
1.9859 19.0 10013 2.3608 6.8255 13.7096
1.975 20.0 10540 2.3695 6.7947 13.6324
1.9674 21.0 11067 2.3731 6.8131 13.6732
1.9582 22.0 11594 2.3766 6.7819 13.7409
1.9483 23.0 12121 2.3754 6.8787 13.5938
1.9443 24.0 12648 2.3836 6.6645 13.6747
1.9337 25.0 13175 2.3865 6.7016 13.7514
1.9265 26.0 13702 2.3891 6.8102 13.7718
1.9184 27.0 14229 2.3962 6.7632 13.7377
1.9134 28.0 14756 2.3994 6.7438 13.8203
1.9027 29.0 15283 2.4079 6.6669 13.7855
1.901 30.0 15810 2.4085 6.7555 13.7292
1.8915 31.0 16337 2.4070 6.8025 13.7606
1.8841 32.0 16864 2.4078 6.769 13.828
1.8794 33.0 17391 2.4088 6.7529 13.825
1.8703 34.0 17918 2.4148 6.7795 13.8596
1.8651 35.0 18445 2.4122 6.7422 13.8233
1.8597 36.0 18972 2.4071 6.7784 13.8395
1.8568 37.0 19499 2.4106 6.7127 13.8599
1.8436 38.0 20026 2.4177 6.8216 13.8977
1.8386 39.0 20553 2.4212 6.72 13.8596
1.843 40.0 21080 2.3578 6.7825 13.7315
1.8861 41.0 21607 2.3585 6.7195 13.5811
1.9214 42.0 22134 2.3743 6.7537 13.7451
2.0399 43.0 22661 2.5768 5.1918 13.6165
2.2339 44.0 23188 2.5552 5.2251 13.7357
2.2102 45.0 23715 2.5288 5.2795 13.8405
2.1798 46.0 24242 2.5107 5.4188 13.9622
2.1667 47.0 24769 2.4992 5.4951 14.0577
2.1463 48.0 25296 2.4904 5.5393 14.1063
2.1284 49.0 25823 2.4842 5.6771 14.1812
2.1142 50.0 26350 2.4803 5.6807 14.3044
2.1067 51.0 26877 2.4775 5.7383 14.3387
2.0961 52.0 27404 2.4767 5.7043 14.3579
2.0891 53.0 27931 2.4771 5.7167 14.3853
2.0853 54.0 28458 2.4780 5.7627 14.4191
2.0783 55.0 28985 2.4774 5.7501 14.4121
2.0744 56.0 29512 2.4825 5.6738 14.3785
2.0746 57.0 30039 2.4889 5.6481 14.3435
2.0763 58.0 30566 2.4937 5.6288 14.3298
2.0696 59.0 31093 2.4985 5.6343 14.3293
2.0714 60.0 31620 2.5013 5.6636 14.3596
2.0706 61.0 32147 2.5043 5.6589 14.3544
2.065 62.0 32674 2.5072 5.6727 14.3691
2.0662 63.0 33201 2.5099 5.6883 14.3962
2.0653 64.0 33728 2.5170 5.6343 14.3604
2.0679 65.0 34255 2.5239 5.604 14.3328
2.0738 66.0 34782 2.5295 5.5741 14.3064
2.0741 67.0 35309 2.5347 5.5617 14.283
2.0717 68.0 35836 2.5392 5.5388 14.3044
2.0693 69.0 36363 2.5437 5.5111 14.2927
2.0739 70.0 36890 2.5479 5.5074 14.2651
2.074 71.0 37417 2.5554 5.4703 14.2598
2.0796 72.0 37944 2.5651 5.4628 14.2439
2.0775 73.0 38471 2.5742 5.4606 14.2668
2.0827 74.0 38998 2.5827 5.4494 14.2367
2.0928 75.0 39525 2.5906 5.4626 14.226
2.0995 76.0 40052 2.5979 5.4589 14.269
2.0984 77.0 40579 2.6057 5.4754 14.282
2.1017 78.0 41106 2.6138 5.5446 14.3079
2.1098 79.0 41633 2.6217 5.5664 14.3081
2.1164 80.0 42160 2.6296 5.5431 14.3285
2.118 81.0 42687 2.6369 5.5365 14.3342
2.1227 82.0 43214 2.6440 5.5201 14.3589
2.1291 83.0 43741 2.6463 5.5251 14.3654
2.125 84.0 44268 2.6462 5.5234 14.3736
2.1288 85.0 44795 2.6461 5.5387 14.3532
2.1266 86.0 45322 2.6461 5.5434 14.3534
2.1269 87.0 45849 2.6461 5.5434 14.3534
2.1301 88.0 46376 2.6461 5.5434 14.3534
2.1279 89.0 46903 2.6461 5.5434 14.3534
2.1267 90.0 47430 2.6461 5.5434 14.3534
2.1259 91.0 47957 2.6461 5.5434 14.3534
2.1281 92.0 48484 2.6461 5.5434 14.3534
2.1288 93.0 49011 2.6461 5.5434 14.3534
2.1263 94.0 49538 2.6461 5.5434 14.3534
2.1288 95.0 50065 2.6461 5.5434 14.3534
2.1264 96.0 50592 2.6461 5.5434 14.3534
2.127 97.0 51119 2.6461 5.5434 14.3534
2.1271 98.0 51646 2.6461 5.5434 14.3534
2.1307 99.0 52173 2.6461 5.5434 14.3534
2.1246 100.0 52700 2.6461 5.5434 14.3534

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.2.2+cu118
  • Datasets 2.18.0
  • Tokenizers 0.15.1
Downloads last month
5
Safetensors
Model size
223M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Ziyi98/T5-mask-100-beam-3

Finetuned
(4)
this model