BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 12
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 12
VisMin Collection VisMin (visual minimal-change ) is a controlled benchmark and fine-tuned models trained on vismin training set e.g. VisMin-CLIP and VisMin-Idefics2. • 4 items • Updated Nov 21, 2024
VisMin Collection VisMin (visual minimal-change ) is a controlled benchmark and fine-tuned models trained on vismin training set e.g. VisMin-CLIP and VisMin-Idefics2. • 4 items • Updated Nov 21, 2024
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published Jul 3, 2024 • 29
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8, 2024 • 38