Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published 7 days ago • 21
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published 6 days ago • 29
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Paper • 2501.12370 • Published 9 days ago • 8
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published 5 days ago • 9
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 4 days ago • 17
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 4 days ago • 14
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 3 days ago • 19