aristotle_new_layer_md
This model is a fine-tuned version of ai-forever/rugpt3small_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.5495
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 30
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
9.9676 | 1.0 | 95 | 7.7187 |
7.3648 | 2.0 | 190 | 6.7524 |
6.3191 | 3.0 | 285 | 5.9377 |
5.4765 | 4.0 | 380 | 5.5640 |
5.0688 | 5.0 | 475 | 5.3898 |
4.9737 | 6.0 | 570 | 5.2278 |
4.8498 | 7.0 | 665 | 5.1026 |
4.7189 | 8.0 | 760 | 5.0769 |
4.635 | 9.0 | 855 | 4.8809 |
4.5526 | 10.0 | 950 | 4.9227 |
4.4848 | 11.0 | 1045 | 4.8871 |
4.4918 | 12.0 | 1140 | 4.7096 |
4.3303 | 13.0 | 1235 | 4.7373 |
4.3412 | 14.0 | 1330 | 4.7861 |
4.2659 | 15.0 | 1425 | 4.6738 |
4.2575 | 16.0 | 1520 | 4.6526 |
4.2401 | 17.0 | 1615 | 4.7348 |
4.1957 | 18.0 | 1710 | 4.5350 |
4.1738 | 19.0 | 1805 | 4.5723 |
4.1671 | 20.0 | 1900 | 4.5821 |
4.0669 | 21.0 | 1995 | 4.5495 |
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Tokenizers 0.21.0
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for DmitryYarov/aristotle-graph-logic-nlayer
Base model
ai-forever/rugpt3small_based_on_gpt2