Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond Paper β’ 2310.02071 β’ Published Oct 3, 2023 β’ 4
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks Paper β’ 2311.09835 β’ Published Nov 16, 2023 β’ 9
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain Paper β’ 2402.15527 β’ Published Feb 21, 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper β’ 2403.06764 β’ Published Mar 11, 2024 β’ 26
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper β’ 2407.00468 β’ Published Jun 29, 2024 β’ 35
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper β’ 2407.05282 β’ Published Jul 7, 2024 β’ 13