MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 21
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published May 20 • 25
On the Planning Abilities of Large Language Models -- A Critical Investigation Paper • 2305.15771 • Published May 25, 2023 • 1
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published Jun 13 • 24
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding Paper • 2406.09411 • Published Jun 13 • 18
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published Jun 11 • 22
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Paper • 2404.05405 • Published Apr 8 • 9
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 61
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24 • 41
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Paper • 2406.14546 • Published Jun 20 • 2