kaizuberbuehler
's Collections
LM Capabilities and Scaling
updated
Compression Represents Intelligence Linearly
Paper
•
2404.09937
•
Published
•
27
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
•
2404.06395
•
Published
•
22
Long-context LLMs Struggle with Long In-context Learning
Paper
•
2404.02060
•
Published
•
36
Are large language models superhuman chemists?
Paper
•
2404.01475
•
Published
•
17
FlowMind: Automatic Workflow Generation with LLMs
Paper
•
2404.13050
•
Published
•
34
Capabilities of Gemini Models in Medicine
Paper
•
2404.18416
•
Published
•
24
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
•
2405.12107
•
Published
•
27
On the Planning Abilities of Large Language Models -- A Critical
Investigation
Paper
•
2305.15771
•
Published
•
1
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper
•
2406.09170
•
Published
•
26
MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding
Paper
•
2406.09411
•
Published
•
19
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo
Tree Self-refine with LLaMa-3 8B
Paper
•
2406.07394
•
Published
•
27
GEB-1.3B: Open Lightweight Large Language Model
Paper
•
2406.09900
•
Published
•
21
Mixture of A Million Experts
Paper
•
2407.04153
•
Published
•
5
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper
•
2404.05405
•
Published
•
10
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
70
Attention Heads of Large Language Models: A Survey
Paper
•
2409.03752
•
Published
•
89
HelloBench: Evaluating Long Text Generation Capabilities of Large
Language Models
Paper
•
2409.16191
•
Published
•
42
Making Text Embedders Few-Shot Learners
Paper
•
2409.15700
•
Published
•
30
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
Paper
•
2406.14546
•
Published
•
2
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
91
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
48
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
•
2501.01264
•
Published
•
25
Paper
•
2412.04315
•
Published
•
17
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
Quantized LLMs with 100T Training Tokens
Paper
•
2411.17691
•
Published
•
11
PokerBench: Training Large Language Models to become Professional Poker
Players
Paper
•
2501.08328
•
Published
•
13
Do generative video models learn physical principles from watching
videos?
Paper
•
2501.09038
•
Published
•
30