Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
|
|
12 |
|
13 |
![wsd_scheduler](./wsd_scheduler.png)
|
14 |
|
15 |
-
Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training from any checkpoint in the `stable stage` without
|
16 |
|
17 |
Here are the initial learning rates required to continue training at each checkpoint:
|
18 |
|
|
|
12 |
|
13 |
![wsd_scheduler](./wsd_scheduler.png)
|
14 |
|
15 |
+
Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training on any new dataset from any checkpoint in the `stable stage` without spikes of the training.
|
16 |
|
17 |
Here are the initial learning rates required to continue training at each checkpoint:
|
18 |
|