The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer Paper • 2502.15631 • Published 5 days ago • 6
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 27 days ago • 19
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 134
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections Paper • 2502.12170 • Published 13 days ago • 11
Continuous Diffusion Model for Language Modeling Paper • 2502.11564 • Published 9 days ago • 49
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published 10 days ago • 50
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 8 days ago • 35