BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18 • 43
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 19 days ago • 45