VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 60
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 32 items • Updated about 11 hours ago • 145
Sana Collection ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 21 items • Updated 26 days ago • 88
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published Oct 14, 2024 • 17
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27, 2024 • 26
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Paper • 2408.04594 • Published Aug 8, 2024 • 15