The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper โข 2411.10323 โข Published Nov 15, 2024 โข 31
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper โข 2411.10440 โข Published Nov 15, 2024 โข 113
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper โข 2410.05993 โข Published Oct 8, 2024 โข 108