-
92
Qwen2.5 VL 72B Instruct
💻Interact with Qwen2.5-VL-72B to get responses and generate images
-
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 143 -
Qwen/Qwen2.5-VL-72B-Instruct
Image-Text-to-Text • Updated • 229k • 322 -
Qwen/Qwen2.5-VL-72B-Instruct-AWQ
Image-Text-to-Text • Updated • 10.3k • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2502.13923
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 40
-
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Paper • 2502.14786 • Published • 115 -
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
Paper • 2502.14834 • Published • 23 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 143 -
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Paper • 2502.17157 • Published • 37
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 42 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 80 -
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 30 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 16 -
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Paper • 2502.08130 • Published • 9
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 99 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 82 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 107 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 128