PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog β’ 9 items β’ Updated 27 days ago β’ 50
Inf-CL Collection The corresponding demos/checkpoints/papers/datasets of Inf-CL. β’ 2 items β’ Updated Oct 25 β’ 3
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models Paper β’ 2410.23266 β’ Published Oct 30 β’ 20
BAAI/Aquila-VL-2B-llava-qwen Visual Question Answering β’ Updated about 1 month ago β’ 1.66k β’ 54