Collections
Discover the best community collections!
Collections including paper arxiv:2408.03361
-
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Paper • 2408.04594 • Published • 14 -
Achieving Human Level Competitive Robot Table Tennis
Paper • 2408.03906 • Published • 26 -
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Paper • 2408.03361 • Published • 85 -
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Paper • 2408.08201 • Published • 17
-
A Comparative Study on Automatic Coding of Medical Letters with Explainability
Paper • 2407.13638 • Published • 5 -
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
Paper • 2407.07061 • Published • 26 -
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 43 -
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Paper • 2407.06723 • Published • 10
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 4
-
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 7 -
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Paper • 2406.08407 • Published • 24 -
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Paper • 2408.03361 • Published • 85
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 182 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 24 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 33