Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3 • 30
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers Paper • 2311.17136 • Published Nov 28, 2023 • 7
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? Paper • 2302.11713 • Published Feb 23, 2023 • 1
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities Paper • 2302.11154 • Published Feb 22, 2023 • 1
AVIS: Autonomous Visual Information Seeking with Large Language Models Paper • 2306.08129 • Published Jun 13, 2023 • 5