hkust-nlp/CodeIO-PyEdu-Reasoning
Preview
•
Updated
•
43
•
29
How exactly is the Qwen/Qwen2.5-Math-RM-72B model used? Is it solely for ranking multiple answers? Can it also serve as a tool to validate whether the answers are correct?