An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published 3 days ago • 28
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 3 days ago • 51
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published 5 days ago • 17
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 5 days ago • 54
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper • 2501.02955 • Published 6 days ago • 39
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control Paper • 2501.02260 • Published 8 days ago • 4
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Paper • 2501.03059 • Published 6 days ago • 18
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking Paper • 2501.02690 • Published 7 days ago • 15
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 6 days ago • 46
Nested Attention: Semantic-aware Attention Values for Concept Personalization Paper • 2501.01407 • Published 10 days ago • 10
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 10 days ago • 46
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 12 days ago • 40
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 16 days ago • 78
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published 13 days ago • 15
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 19 days ago • 65
Bringing Objects to Life: 4D generation from 3D objects Paper • 2412.20422 • Published 14 days ago • 33
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage Paper • 2412.15484 • Published 23 days ago • 14
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published 21 days ago • 28
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 20 days ago • 24
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published 23 days ago • 17