Update README.md
Browse files
README.md
CHANGED
@@ -55,7 +55,7 @@ The following hyperparameters were used during training:
|
|
55 |
- num_epochs: 1
|
56 |
```
|
57 |
|
58 |
-
Optimizer `paged_adamw_8bit` and Deepspeed ZeRO 3 was used at a LR of `1e-5` using the cosine scheduler for 1 epoch on 3x3090s taking
|
59 |
|
60 |
Sample packing and padding was disabled to reduce VRAM consumption significantly at the cost of speed.
|
61 |
|
|
|
55 |
- num_epochs: 1
|
56 |
```
|
57 |
|
58 |
+
Optimizer `paged_adamw_8bit` and Deepspeed ZeRO 3 was used at a LR of `1e-5` using the cosine scheduler for 1 epoch on 3x3090s taking 2h 30m total.
|
59 |
|
60 |
Sample packing and padding was disabled to reduce VRAM consumption significantly at the cost of speed.
|
61 |
|