MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 3 days ago • 73
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published 13 days ago • 8
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Paper • 2403.15246 • Published Mar 22, 2024 • 10
OLMo: Accelerating the Science of Language Models Paper • 2402.00838 • Published Feb 1, 2024 • 83
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks Paper • 2311.09835 • Published Nov 16, 2023 • 10
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 9