JingzeShi
/

Doge-20M-checkpoint

Text Generation

Model card Files Files and versions Community

JingzeShi commited on 4 days ago

Commit

da46cff

·

verified ·

1 Parent(s): b62417a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ pipeline_tag: text-generation
 ![wsd_scheduler](./wsd_scheduler.png)
-Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training from any checkpoint in the `stable stage` without causing loss rebound.
 Here are the initial learning rates required to continue training at each checkpoint:

 ![wsd_scheduler](./wsd_scheduler.png)
+Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training on any new dataset from any checkpoint in the `stable stage` without spikes of the training.
 Here are the initial learning rates required to continue training at each checkpoint: