CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper β’ 2502.16614 β’ Published 2 days ago β’ 18
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper β’ 2502.14739 β’ Published 5 days ago β’ 91
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper β’ 2502.14739 β’ Published 5 days ago β’ 91
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper β’ 2502.14739 β’ Published 5 days ago β’ 91
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper β’ 2502.12982 β’ Published 7 days ago β’ 11