Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 8
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
11.2812 | 0.0 | 1 | 11.5156 |
5.0938 | 0.2 | 62 | 5.1016 |
3.5703 | 0.4 | 124 | 3.7161 |
2.582 | 0.6 | 186 | 2.9010 |
2.2109 | 0.8 | 248 | 2.5156 |
1.9824 | 1.0 | 310 | 2.3477 |
1.8594 | 1.18 | 372 | 2.1960 |
1.748 | 1.38 | 434 | 2.1667 |
1.748 | 1.58 | 496 | 2.0195 |
1.7617 | 1.78 | 558 | 2.0749 |
1.6582 | 1.98 | 620 | 1.9095 |
1.5762 | 2.16 | 682 | 1.9036 |
1.5586 | 2.36 | 744 | 1.8457 |
1.6016 | 2.56 | 806 | 1.8112 |
1.5195 | 2.76 | 868 | 1.8034 |
1.5645 | 2.96 | 930 | 1.7773 |
1.457 | 3.14 | 992 | 1.7474 |
1.4883 | 3.34 | 1054 | 1.7467 |
1.4648 | 3.54 | 1116 | 1.7676 |
1.5195 | 3.74 | 1178 | 1.7383 |
1.4531 | 3.94 | 1240 | 1.7383 |
1.4648 | 4.12 | 1302 | 1.7181 |
1.4121 | 4.32 | 1364 | 1.7272 |
1.4727 | 4.52 | 1426 | 1.7259 |
1.4219 | 4.72 | 1488 | 1.7240 |
1.5137 | 4.92 | 1550 | 1.7227 |
Framework versions
- Transformers 4.37.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.