Taming Teacher Forcing for Masked Autoregressive Video Generation Paper • 2501.12389 • Published 10 days ago • 10
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 17 days ago • 20
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published 10 days ago • 22
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published 10 days ago • 31
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 10 days ago • 39
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 11 days ago • 88
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass Paper • 2501.13928 • Published 8 days ago • 14
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 9 days ago • 53
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 9 days ago • 76
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 9 days ago • 278
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper • 2501.13452 • Published 8 days ago • 7
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 8 days ago • 31
CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation Paper • 2501.11325 • Published 11 days ago • 3
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning Paper • 2411.19458 • Published Nov 29, 2024 • 5