OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published 1 day ago • 56
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published 8 days ago • 17
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 12 days ago • 33
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 17 days ago • 23
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Paper • 2501.11733 • Published 15 days ago • 27
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published 21 days ago • 61
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published 19 days ago • 24
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published 28 days ago • 23
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper • 2412.19645 • Published Dec 27, 2024 • 13
Wonderland: Navigating 3D Scenes from a Single Image Paper • 2412.12091 • Published Dec 16, 2024 • 16
ColorFlow: Retrieval-Augmented Image Sequence Colorization Paper • 2412.11815 • Published Dec 16, 2024 • 26
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption Paper • 2412.09283 • Published Dec 12, 2024 • 19
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 139
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models Paper • 2412.09622 • Published Dec 12, 2024 • 8
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published Dec 12, 2024 • 21
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 44
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published Dec 10, 2024 • 19