HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Paper • 2412.21199 • Published 20 days ago • 12
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 20 days ago • 21