One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Paper • 2501.13554 • Published 3 days ago • 7
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 3 days ago • 20
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 4 days ago • 66
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 4 days ago • 182
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 12 days ago • 17
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published 5 days ago • 25
SEAL: Entangled White-box Watermarks on Low-Rank Adaptation Paper • 2501.09284 • Published 10 days ago • 8
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors Paper • 2501.08225 • Published 12 days ago • 18
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 12 days ago • 55
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 16 days ago • 59
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 19 days ago • 48
Sana Collection ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 19 items • Updated 18 days ago • 87
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published 19 days ago • 23
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 19 days ago • 66
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 20 days ago • 52
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published 20 days ago • 22
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos Paper • 2412.09401 • Published Dec 12, 2024 • 2
Nested Attention: Semantic-aware Attention Values for Concept Personalization Paper • 2501.01407 • Published 24 days ago • 11
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration Paper • 2501.01320 • Published 24 days ago • 11
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 26 days ago • 41