File size: 3,267 Bytes
7d54bbe 1a4c3fb 7d54bbe e5154a3 2837859 e6f7c80 e824e61 21e0ff5 e2a7611 a17711e eba5663 64b457d 8320b0f 46ab38f feabd91 9fc87dc e9b355a e56bcb2 7899639 08fd08f b5a130f d6d90ee b9d5cf0 e005723 c6566b3 547489d eb4b32d a5a7fc7 3f793e1 3e39940 1a4c3fb 7d54bbe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
license: apache-2.0
base_model: distilgpt2
tags:
- generated_from_keras_callback
model-index:
- name: EngTig/distilgpt2-finetuned-wikitext2
results: []
---
<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->
# EngTig/distilgpt2-finetuned-wikitext2
This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Train Loss: 1.4784
- Validation Loss: 4.7279
- Epoch: 47
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32
### Training results
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 2.9937 | 3.8775 | 0 |
| 2.9426 | 3.8763 | 1 |
| 2.8926 | 3.8593 | 2 |
| 2.8445 | 3.8982 | 3 |
| 2.8090 | 3.9044 | 4 |
| 2.7511 | 3.9337 | 5 |
| 2.7140 | 3.9265 | 6 |
| 2.6655 | 3.9483 | 7 |
| 2.6443 | 3.9490 | 8 |
| 2.6153 | 3.9458 | 9 |
| 2.5699 | 3.9660 | 10 |
| 2.5262 | 3.9897 | 11 |
| 2.5002 | 4.0219 | 12 |
| 2.4636 | 4.0540 | 13 |
| 2.4327 | 4.0224 | 14 |
| 2.3945 | 4.0364 | 15 |
| 2.3661 | 4.0640 | 16 |
| 2.3319 | 4.0636 | 17 |
| 2.2992 | 4.0996 | 18 |
| 2.2712 | 4.0886 | 19 |
| 2.2377 | 4.1483 | 20 |
| 2.2054 | 4.1594 | 21 |
| 2.1658 | 4.1989 | 22 |
| 2.1444 | 4.1348 | 23 |
| 2.1129 | 4.1489 | 24 |
| 2.0953 | 4.2259 | 25 |
| 2.0546 | 4.2353 | 26 |
| 2.0281 | 4.3147 | 27 |
| 1.9927 | 4.2586 | 28 |
| 1.9698 | 4.3254 | 29 |
| 1.9373 | 4.3288 | 30 |
| 1.9159 | 4.3262 | 31 |
| 1.8750 | 4.3550 | 32 |
| 1.8480 | 4.3697 | 33 |
| 1.8215 | 4.4233 | 34 |
| 1.7874 | 4.4876 | 35 |
| 1.7685 | 4.5072 | 36 |
| 1.7433 | 4.4617 | 37 |
| 1.7085 | 4.5331 | 38 |
| 1.6839 | 4.5724 | 39 |
| 1.6643 | 4.5819 | 40 |
| 1.6224 | 4.6558 | 41 |
| 1.5981 | 4.5991 | 42 |
| 1.5788 | 4.6276 | 43 |
| 1.5532 | 4.6394 | 44 |
| 1.5164 | 4.6464 | 45 |
| 1.4998 | 4.6634 | 46 |
| 1.4784 | 4.7279 | 47 |
### Framework versions
- Transformers 4.38.2
- TensorFlow 2.15.0
- Datasets 2.18.0
- Tokenizers 0.15.2
|