metadata

license: apache-2.0
base_model: distilgpt2
tags:
  - generated_from_keras_callback
model-index:
  - name: EngTig/distilgpt2-finetuned-wikitext2
    results: []

EngTig/distilgpt2-finetuned-wikitext2

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Train Loss: 2.2054
Validation Loss: 4.1594
Epoch: 21

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: float32

Training results

Train Loss	Validation Loss	Epoch
2.9937	3.8775	0
2.9426	3.8763	1
2.8926	3.8593	2
2.8445	3.8982	3
2.8090	3.9044	4
2.7511	3.9337	5
2.7140	3.9265	6
2.6655	3.9483	7
2.6443	3.9490	8
2.6153	3.9458	9
2.5699	3.9660	10
2.5262	3.9897	11
2.5002	4.0219	12
2.4636	4.0540	13
2.4327	4.0224	14
2.3945	4.0364	15
2.3661	4.0640	16
2.3319	4.0636	17
2.2992	4.0996	18
2.2712	4.0886	19
2.2377	4.1483	20
2.2054	4.1594	21

Framework versions

Transformers 4.38.2
TensorFlow 2.15.0
Datasets 2.18.0
Tokenizers 0.15.2