PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper • 2502.14397 • Published 6 days ago • 33
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence Paper • 2502.13943 • Published 6 days ago • 7
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published 10 days ago • 50
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published 14 days ago • 27
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Paper • 2502.12148 • Published 8 days ago • 16
ReLearn: Unlearning via Learning for Large Language Models Paper • 2502.11190 • Published 9 days ago • 28
Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published 8 days ago • 36
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published 11 days ago • 50
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion Paper • 2502.08590 • Published 13 days ago • 38
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Paper • 2502.08639 • Published 13 days ago • 36
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper • 2502.08047 • Published 14 days ago • 25
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 14 days ago • 42
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers Paper • 2502.06527 • Published 15 days ago • 9