metadata

license: apache-2.0
base_model: distilgpt2
tags:
  - generated_from_keras_callback
model-index:
  - name: EngTig/distilgpt2-finetuned-wikitext2
    results: []

EngTig/distilgpt2-finetuned-wikitext2

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Train Loss: 1.9373
Validation Loss: 4.3288
Epoch: 30

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: float32

Training results

Train Loss	Validation Loss	Epoch
2.9937	3.8775	0
2.9426	3.8763	1
2.8926	3.8593	2
2.8445	3.8982	3
2.8090	3.9044	4
2.7511	3.9337	5
2.7140	3.9265	6
2.6655	3.9483	7
2.6443	3.9490	8
2.6153	3.9458	9
2.5699	3.9660	10
2.5262	3.9897	11
2.5002	4.0219	12
2.4636	4.0540	13
2.4327	4.0224	14
2.3945	4.0364	15
2.3661	4.0640	16
2.3319	4.0636	17
2.2992	4.0996	18
2.2712	4.0886	19
2.2377	4.1483	20
2.2054	4.1594	21
2.1658	4.1989	22
2.1444	4.1348	23
2.1129	4.1489	24
2.0953	4.2259	25
2.0546	4.2353	26
2.0281	4.3147	27
1.9927	4.2586	28
1.9698	4.3254	29
1.9373	4.3288	30

Framework versions

Transformers 4.38.2
TensorFlow 2.15.0
Datasets 2.18.0
Tokenizers 0.15.2