nbroad commited on
Commit
4e683f2
Β·
verified Β·
1 Parent(s): 7231fe5

Training in progress, step 30

Browse files
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eb81808ec2e6166816efeabf644b560a48b51a9833380c4288d1c87da91b3a10
3
  size 174655536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1514f7a83541167587e9eaafd634a441afef161a48e32e1e6556ab18f7da59e0
3
  size 174655536
wandb/run-20250203_103646-f5yiqx2u/files/output.log CHANGED
@@ -34,3 +34,12 @@ The model is not an instance of PreTrainedModel. No liger kernels will be applie
34
  {'loss': 0.2143, 'grad_norm': 4.143075942993164, 'learning_rate': 1.2078404679216864e-05, 'epoch': 0.66}
35
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
36
  {'eval_loss': 0.18873603641986847, 'eval_runtime': 23.0504, 'eval_samples_per_second': 126.071, 'eval_steps_per_second': 2.646, 'epoch': 0.66}
 
 
 
 
 
 
 
 
 
 
34
  {'loss': 0.2143, 'grad_norm': 4.143075942993164, 'learning_rate': 1.2078404679216864e-05, 'epoch': 0.66}
35
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
36
  {'eval_loss': 0.18873603641986847, 'eval_runtime': 23.0504, 'eval_samples_per_second': 126.071, 'eval_steps_per_second': 2.646, 'epoch': 0.66}
37
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [06:24<00:00, 12.82s/it]
38
+ {'loss': 0.1893, 'grad_norm': 1.1092743873596191, 'learning_rate': 8.056828165944282e-06, 'epoch': 0.72}
39
+ {'loss': 0.1882, 'grad_norm': 2.4327919483184814, 'learning_rate': 4.679111137620442e-06, 'epoch': 0.79}
40
+ {'loss': 0.1693, 'grad_norm': 2.00337553024292, 'learning_rate': 2.127347193531757e-06, 'epoch': 0.86}
41
+ {'loss': 0.1865, 'grad_norm': 1.2957417964935303, 'learning_rate': 5.391025884035239e-07, 'epoch': 0.92}
42
+ {'loss': 0.1784, 'grad_norm': 1.101898431777954, 'learning_rate': 0.0, 'epoch': 0.99}
43
+
44
+ {'eval_loss': 0.17505289614200592, 'eval_runtime': 22.7261, 'eval_samples_per_second': 127.871, 'eval_steps_per_second': 2.684, 'epoch': 0.99}
45
+ {'train_runtime': 383.4688, 'train_samples_per_second': 30.308, 'train_steps_per_second': 0.078, 'train_loss': 0.3459165016810099, 'epoch': 0.99}
wandb/run-20250203_103646-f5yiqx2u/run-f5yiqx2u.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:76474af5b676bdc34d3ec6f67a519fb598c141b35fe97ceaa37d13172ea8b0cc
3
- size 262144
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05bfe601da81133637f9e27bee30153b8e5d520d581ebf8b85d9b1f02f2b1b08
3
+ size 360448