aristotle_new_layer_plain
This model is a fine-tuned version of ai-forever/rugpt3small_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 5.1360
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 30
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
8.1202 | 1.0 | 203 | 7.6970 |
6.5259 | 2.0 | 406 | 6.4359 |
6.0754 | 3.0 | 609 | 6.0372 |
5.7242 | 4.0 | 812 | 5.7632 |
5.2971 | 5.0 | 1015 | 5.5099 |
5.0427 | 6.0 | 1218 | 5.3732 |
4.8016 | 7.0 | 1421 | 5.2518 |
4.559 | 8.0 | 1624 | 5.1812 |
4.3407 | 9.0 | 1827 | 5.1369 |
4.0474 | 10.0 | 2030 | 5.1208 |
3.8746 | 11.0 | 2233 | 5.1177 |
3.6983 | 12.0 | 2436 | 5.0946 |
3.5034 | 13.0 | 2639 | 5.1002 |
3.3277 | 14.0 | 2842 | 5.1041 |
3.1368 | 15.0 | 3045 | 5.1360 |
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.3.2
- Tokenizers 0.21.0
- Downloads last month
- 10
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for DmitryYarov/aristotle_new_layer_plain
Base model
ai-forever/rugpt3small_based_on_gpt2