Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 28
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 124
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 11 days ago • 130
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published 18 days ago • 18 • 4
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published 19 days ago • 10
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing Paper • 2412.04280 • Published 19 days ago • 13
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22 • 55
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21 • 42
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7 • 49