metadata
license: mit
VIRL-L-Init
This model serves as a initial checkpoint to reproduce results in paper SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.
Related links
Website: https://tianzhechu.com/SFTvsRL/
Github: https://github.com/LeslieTrue/SFTvsRL