GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse Paper • 2401.01523 • Published Jan 3, 2024 • 1
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models Paper • 2401.13298 • Published Jan 24, 2024
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models Paper • 2405.00390 • Published May 1, 2024
CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding? Paper • 2408.10718 • Published Aug 20, 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Paper • 2408.17175 • Published Aug 30, 2024 • 3
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents Paper • 2411.07965 • Published Nov 12, 2024
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published 20 days ago • 23
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Paper • 2411.18932 • Published Nov 28, 2024
view article Article ✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang and 1 other • Jan 3 • 13