Training in progress, step 2475

Browse files

Files changed (5) hide show

README.md +4 -4
logs/attn_projector=orthogonal, attn_weight=5, extra_grad_stats=False, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0/events.out.tfevents.1725291816.261a4d6fb516 +3 -0
logs/attn_projector=orthogonal, attn_weight=5, extra_grad_stats=True, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0/events.out.tfevents.1725291778.261a4d6fb516 +3 -0
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: distilbert/distilgpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: distily_dummy
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# distily_dummy
 This model is a fine-tuned version of [distilbert/distilgpt2](https://huggingface.co/distilbert/distilgpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.6572
 ## Model description
@@ -47,7 +47,7 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| No log        | 0     | 0    | 20.6500         |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: distily_validate_extra_grad_stats
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# distily_validate_extra_grad_stats
 This model is a fine-tuned version of [distilbert/distilgpt2](https://huggingface.co/distilbert/distilgpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.7931
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| No log        | 0     | 0    | 22.3400         |
 ### Framework versions

logs/attn_projector=orthogonal, attn_weight=5, extra_grad_stats=False, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0/events.out.tfevents.1725291816.261a4d6fb516 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e0848864cbd96d15de3cbf6fa2569392e8515a47bcd845f556ba96570a3d65c
+size 98668

logs/attn_projector=orthogonal, attn_weight=5, extra_grad_stats=True, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0/events.out.tfevents.1725291778.261a4d6fb516 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:083d9c358dd7fada23ed373963cf6bde09cd86eba870f066dacf0fc68b237958
+size 520

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:56dd2b32c86895edbe4cea2c79402f5377a83b774ffab9998370d291618d7bb9
 size 163832792

 version https://git-lfs.github.com/spec/v1
+oid sha256:2feff52230c986d95ded9aae7b80d712707444303cefeb5668b95371e841bb4f
 size 163832792

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1e1e09d16a48f446c465fdc06c6740b6d2c30025de598512ffb059d9f6320a9e
 size 5496

 version https://git-lfs.github.com/spec/v1
+oid sha256:2c222a0defde32e2cd1e21d4a8aaf9c7c6fd922d367b52946d9ed734f8f411bf
 size 5496