What kind of training data used in the RL process of R1 Zero?
#14
by
RitchieLeung
- opened
Thanks the awesome job of DeepSeek, I got a question while I read the technique report:
what kind of training data used in the RL process of R1 Zero?