question about training updates
Hello, thanks for the release of more training details of the deepscaler! However, I have a small question: Why does your checkpoint of the first stage show that it was trained for 560 steps rather than 1040 steps, as indicated by the author (https://github.com/agentica-project/deepscaler#:~:text=At%20step%201040%20and%201520%2C%20the%20context%20length%20is%20extended%20to%2016K%20and%2024K.)?
I am also reproducing the results of deepscaler, and have trained 680 steps (the ckpt dir indicating global_step_680) in the first stage. The training procedure is still going on. Moreover, no training log indicates how many updates have been trained. I wonder if I have to manually stop the procedure and when I should do.
主要是因为我发现这个step的test score更高(是个转折点,response的最小和平均长度在这个步数开始发生转折,持续增长),于是我就选择了这个560步进一步进行训练