Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 200
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper • 2501.13452 • Published 12 days ago • 7
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published 11 days ago • 21
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 11 days ago • 22
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 71