File size: 3,185 Bytes
7d54bbe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f793e1
 
 
7d54bbe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5154a3
2837859
 
 
 
e6f7c80
e824e61
21e0ff5
 
 
 
e2a7611
a17711e
eba5663
64b457d
8320b0f
46ab38f
feabd91
 
 
 
 
9fc87dc
e9b355a
 
 
 
e56bcb2
 
 
 
7899639
08fd08f
b5a130f
d6d90ee
 
 
b9d5cf0
e005723
c6566b3
 
 
547489d
eb4b32d
a5a7fc7
3f793e1
7d54bbe
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: apache-2.0
base_model: distilgpt2
tags:
- generated_from_keras_callback
model-index:
- name: EngTig/distilgpt2-finetuned-wikitext2
  results: []
---

<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->

# EngTig/distilgpt2-finetuned-wikitext2

This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Train Loss: 1.5164
- Validation Loss: 4.6464
- Epoch: 45

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32

### Training results

| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 2.9937     | 3.8775          | 0     |
| 2.9426     | 3.8763          | 1     |
| 2.8926     | 3.8593          | 2     |
| 2.8445     | 3.8982          | 3     |
| 2.8090     | 3.9044          | 4     |
| 2.7511     | 3.9337          | 5     |
| 2.7140     | 3.9265          | 6     |
| 2.6655     | 3.9483          | 7     |
| 2.6443     | 3.9490          | 8     |
| 2.6153     | 3.9458          | 9     |
| 2.5699     | 3.9660          | 10    |
| 2.5262     | 3.9897          | 11    |
| 2.5002     | 4.0219          | 12    |
| 2.4636     | 4.0540          | 13    |
| 2.4327     | 4.0224          | 14    |
| 2.3945     | 4.0364          | 15    |
| 2.3661     | 4.0640          | 16    |
| 2.3319     | 4.0636          | 17    |
| 2.2992     | 4.0996          | 18    |
| 2.2712     | 4.0886          | 19    |
| 2.2377     | 4.1483          | 20    |
| 2.2054     | 4.1594          | 21    |
| 2.1658     | 4.1989          | 22    |
| 2.1444     | 4.1348          | 23    |
| 2.1129     | 4.1489          | 24    |
| 2.0953     | 4.2259          | 25    |
| 2.0546     | 4.2353          | 26    |
| 2.0281     | 4.3147          | 27    |
| 1.9927     | 4.2586          | 28    |
| 1.9698     | 4.3254          | 29    |
| 1.9373     | 4.3288          | 30    |
| 1.9159     | 4.3262          | 31    |
| 1.8750     | 4.3550          | 32    |
| 1.8480     | 4.3697          | 33    |
| 1.8215     | 4.4233          | 34    |
| 1.7874     | 4.4876          | 35    |
| 1.7685     | 4.5072          | 36    |
| 1.7433     | 4.4617          | 37    |
| 1.7085     | 4.5331          | 38    |
| 1.6839     | 4.5724          | 39    |
| 1.6643     | 4.5819          | 40    |
| 1.6224     | 4.6558          | 41    |
| 1.5981     | 4.5991          | 42    |
| 1.5788     | 4.6276          | 43    |
| 1.5532     | 4.6394          | 44    |
| 1.5164     | 4.6464          | 45    |


### Framework versions

- Transformers 4.38.2
- TensorFlow 2.15.0
- Datasets 2.18.0
- Tokenizers 0.15.2