Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities Paper • 2401.14405 • Published Jan 25 • 11
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Paper • 2406.18521 • Published Jun 26 • 25
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published 28 days ago • 33
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published 21 days ago • 55
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published 28 days ago • 109