Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper • 2501.05122 • Published 3 days ago • 15
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 3 days ago • 54
Multi-task retriever fine-tuning for domain-specific and efficient RAG Paper • 2501.04652 • Published 4 days ago • 8
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 3 days ago • 50
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published 4 days ago • 44
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 4 days ago • 190
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 4 days ago • 72
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 5 days ago • 66
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 5 days ago • 43
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models Paper • 2501.00874 • Published 11 days ago • 11
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 9 days ago • 29
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published 10 days ago • 20
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 13 days ago • 23
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published 10 days ago • 24
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 11 days ago • 92
Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published 14 days ago • 19
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published 13 days ago • 34