omlab
's Collections
Multimodal Research
updated
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities
through Tree-Based Image Exploration
Paper
•
2411.16044
•
Published
•
1
OmChat: A Recipe to Train Multimodal Language Models with Strong Long
Context and Video Understanding
Paper
•
2407.04923
•
Published
•
1
OmDet: Large-scale vision-language multi-dataset pre-training with
multimodal detection network
Paper
•
2209.05946
•
Published
•
1
VL-CheckList: Evaluating Pre-trained Vision-Language Models with
Objects, Attributes and Relations
Paper
•
2207.00221
•
Published
•
1
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language
Pre-training and Open-Vocabulary Object Detection
Paper
•
2312.15043
•
Published
•
1
How to Evaluate the Generalization of Detection? A Benchmark for
Comprehensive Open-Vocabulary Detection
Paper
•
2308.13177
•
Published
Real-time Transformer-based Open-Vocabulary Detection with Efficient
Fusion Head
Paper
•
2403.06892
•
Published
•
1
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
Vision-Language Model for Remote Sensing
Paper
•
2306.11300
•
Published
•
1
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding
with Task Divide-and-Conquer
Paper
•
2406.16620
•
Published
•
2