Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Paper • 2310.01506 • Published Oct 2, 2023
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper • 2402.19299 • Published Feb 29, 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
Multi-modal Cooking Workflow Construction for Food Recipes Paper • 2008.09151 • Published Aug 20, 2020 • 1
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers Paper • 2501.03931 • Published 19 days ago • 14
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 45
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy Paper • 2203.07845 • Published Mar 15, 2022
Video Background Music Generation with Controllable Music Transformer Paper • 2111.08380 • Published Nov 16, 2021 • 1
Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need Paper • 2302.02615 • Published Feb 6, 2023
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion Paper • 2412.09593 • Published Dec 12, 2024 • 18
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published Sep 17, 2024 • 26
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12, 2024 • 53
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance Paper • 2306.00943 • Published Jun 1, 2023 • 5
Real-World Image Variation by Aligning Diffusion Inversion Chain Paper • 2305.18729 • Published May 30, 2023 • 4