JingzeShi commited on
Commit
da46cff
·
verified ·
1 Parent(s): b62417a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
12
 
13
  ![wsd_scheduler](./wsd_scheduler.png)
14
 
15
- Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training from any checkpoint in the `stable stage` without causing loss rebound.
16
 
17
  Here are the initial learning rates required to continue training at each checkpoint:
18
 
 
12
 
13
  ![wsd_scheduler](./wsd_scheduler.png)
14
 
15
+ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training on any new dataset from any checkpoint in the `stable stage` without spikes of the training.
16
 
17
  Here are the initial learning rates required to continue training at each checkpoint:
18