TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Paper • 2501.12224 • Published 5 days ago • 43
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 5 days ago • 75
Salesforce/xgen-mm-phi3-mini-instruct-dpo-r-v1.5 Image-Text-to-Text • Updated Sep 16, 2024 • 87 • 17
HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit Zero-Shot Image Classification • Updated Mar 7, 2024 • 8.63k • 43
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published 17 days ago • 15
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 17 days ago • 84
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 24 days ago • 49