How to Get Your LLM to Generate Challenging Problems for Evaluation Paper • 2502.14678 • Published 5 days ago • 14
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 5 days ago • 91
Diverse Inference and Verification for Advanced Reasoning Paper • 2502.09955 • Published 12 days ago • 16
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published 14 days ago • 17
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 16 days ago • 124
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper • 2502.09056 • Published 13 days ago • 30
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published 19 days ago • 30
NeMo Curator - Classifier Models Collection Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 11 items • Updated 11 days ago • 16
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release • 12 items • Updated 5 days ago • 69
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published Jan 9 • 37