Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper β’ 2412.15322 β’ Published Dec 19, 2024 β’ 18
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper β’ 2411.07126 β’ Published Nov 11, 2024 β’ 28
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. β’ 8 items β’ Updated Nov 23, 2024 β’ 80
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Paper β’ 2411.02327 β’ Published Nov 4, 2024 β’ 11
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper β’ 2410.10306 β’ Published Oct 14, 2024 β’ 54
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper β’ 2409.18964 β’ Published Sep 27, 2024 β’ 26
Training Language Models to Self-Correct via Reinforcement Learning Paper β’ 2409.12917 β’ Published Sep 19, 2024 β’ 136
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper β’ 2409.02634 β’ Published Sep 4, 2024 β’ 92
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper β’ 2408.15998 β’ Published Aug 28, 2024 β’ 86