SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Paper • 2411.05007 • Published 2 days ago • 13
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models Paper • 2411.05005 • Published 2 days ago • 12
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Paper • 2411.04989 • Published 2 days ago • 12
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published 2 days ago • 18
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published 4 days ago • 21
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 2 days ago • 30
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published 2 days ago • 56
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 2 days ago • 76
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level Paper • 2411.03562 • Published 4 days ago • 41
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination Paper • 2411.03823 • Published 4 days ago • 41
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond Paper • 2411.03590 • Published 4 days ago • 9
Adaptive Length Image Tokenization via Recurrent Allocation Paper • 2411.02393 • Published 5 days ago • 11
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Paper • 2411.02359 • Published 5 days ago • 12
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published 5 days ago • 52
DreamPolish: Domain Score Distillation With Progressive Geometry Generation Paper • 2411.01602 • Published 6 days ago • 9
Controlling Language and Diffusion Models by Transporting Activations Paper • 2410.23054 • Published 10 days ago • 14
Correlation of Object Detection Performance with Visual Saliency and Depth Estimation Paper • 2411.02844 • Published 5 days ago • 3