ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 29 days ago • 76 • 3
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31 • 48 • 3
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23 • 200 • 4
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8 • 107 • 7
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 144 • 17
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published Sep 20 • 48 • 9