A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents Paper • 2410.22476 • Published 19 days ago • 24
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published 25 days ago • 49
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published 26 days ago • 88
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published 26 days ago • 43
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13 • 54
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4 • 35
Only-IF:Revealing the Decisive Effect of Instruction Diversity on Generalization Paper • 2410.04717 • Published Oct 7 • 17
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Paper • 2410.01912 • Published Oct 2 • 13
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published Sep 20 • 67
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published Sep 20 • 48
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19 • 21
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 134