lapp0 commited on
Commit
60dcfea
1 Parent(s): 1f73a51

Training in progress, step 2475

Browse files
README.md CHANGED
@@ -5,18 +5,18 @@ base_model: distilbert/distilgpt2
5
  tags:
6
  - generated_from_trainer
7
  model-index:
8
- - name: distily_dummy
9
  results: []
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- # distily_dummy
16
 
17
  This model is a fine-tuned version of [distilbert/distilgpt2](https://huggingface.co/distilbert/distilgpt2) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 1.6572
20
 
21
  ## Model description
22
 
@@ -47,7 +47,7 @@ The following hyperparameters were used during training:
47
 
48
  | Training Loss | Epoch | Step | Validation Loss |
49
  |:-------------:|:-----:|:----:|:---------------:|
50
- | No log | 0 | 0 | 20.6500 |
51
 
52
 
53
  ### Framework versions
 
5
  tags:
6
  - generated_from_trainer
7
  model-index:
8
+ - name: distily_validate_extra_grad_stats
9
  results: []
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # distily_validate_extra_grad_stats
16
 
17
  This model is a fine-tuned version of [distilbert/distilgpt2](https://huggingface.co/distilbert/distilgpt2) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 1.7931
20
 
21
  ## Model description
22
 
 
47
 
48
  | Training Loss | Epoch | Step | Validation Loss |
49
  |:-------------:|:-----:|:----:|:---------------:|
50
+ | No log | 0 | 0 | 22.3400 |
51
 
52
 
53
  ### Framework versions
logs/attn_projector=orthogonal, attn_weight=5, extra_grad_stats=False, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0/events.out.tfevents.1725291816.261a4d6fb516 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e0848864cbd96d15de3cbf6fa2569392e8515a47bcd845f556ba96570a3d65c
3
+ size 98668
logs/attn_projector=orthogonal, attn_weight=5, extra_grad_stats=True, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0/events.out.tfevents.1725291778.261a4d6fb516 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:083d9c358dd7fada23ed373963cf6bde09cd86eba870f066dacf0fc68b237958
3
+ size 520
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:56dd2b32c86895edbe4cea2c79402f5377a83b774ffab9998370d291618d7bb9
3
  size 163832792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2feff52230c986d95ded9aae7b80d712707444303cefeb5668b95371e841bb4f
3
  size 163832792
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e1e09d16a48f446c465fdc06c6740b6d2c30025de598512ffb059d9f6320a9e
3
  size 5496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c222a0defde32e2cd1e21d4a8aaf9c7c6fd922d367b52946d9ed734f8f411bf
3
  size 5496