Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 89 items • Updated 14 days ago • 94
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context Jul 23, 2024 • 226
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21, 2024 • 70
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Paper • 2406.12644 • Published Jun 18, 2024 • 4