metadata

license: apache-2.0
base_model: distilgpt2
tags:
  - generated_from_keras_callback
model-index:
  - name: EngTig/distilgpt2-finetuned-wikitext2
    results: []

EngTig/distilgpt2-finetuned-wikitext2

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Train Loss: 1.5164
Validation Loss: 4.6464
Epoch: 45

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: float32

Training results

Train Loss	Validation Loss	Epoch
2.9937	3.8775	0
2.9426	3.8763	1
2.8926	3.8593	2
2.8445	3.8982	3
2.8090	3.9044	4
2.7511	3.9337	5
2.7140	3.9265	6
2.6655	3.9483	7
2.6443	3.9490	8
2.6153	3.9458	9
2.5699	3.9660	10
2.5262	3.9897	11
2.5002	4.0219	12
2.4636	4.0540	13
2.4327	4.0224	14
2.3945	4.0364	15
2.3661	4.0640	16
2.3319	4.0636	17
2.2992	4.0996	18
2.2712	4.0886	19
2.2377	4.1483	20
2.2054	4.1594	21
2.1658	4.1989	22
2.1444	4.1348	23
2.1129	4.1489	24
2.0953	4.2259	25
2.0546	4.2353	26
2.0281	4.3147	27
1.9927	4.2586	28
1.9698	4.3254	29
1.9373	4.3288	30
1.9159	4.3262	31
1.8750	4.3550	32
1.8480	4.3697	33
1.8215	4.4233	34
1.7874	4.4876	35
1.7685	4.5072	36
1.7433	4.4617	37
1.7085	4.5331	38
1.6839	4.5724	39
1.6643	4.5819	40
1.6224	4.6558	41
1.5981	4.5991	42
1.5788	4.6276	43
1.5532	4.6394	44
1.5164	4.6464	45

Framework versions

Transformers 4.38.2
TensorFlow 2.15.0
Datasets 2.18.0
Tokenizers 0.15.2