SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 5 days ago • 91
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published 7 days ago • 11
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published Jan 18 • 24
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 49
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 43
Iterative Forward Tuning Boosts In-Context Learning in Language Models Paper • 2305.13016 • Published May 22, 2023 • 1
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts Paper • 2305.14839 • Published May 24, 2023 • 1
One Shot Learning as Instruction Data Prospector for Large Language Models Paper • 2312.10302 • Published Dec 16, 2023 • 3
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 46
Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning Paper • 2301.13808 • Published Jan 31, 2023