PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns Paper • 2403.13315 • Published Mar 20
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths Paper • 2410.10858 • Published Oct 7
Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models Paper • 2409.14277 • Published Sep 22
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 55
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions Paper • 2405.20267 • Published May 30 • 1
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 55
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt Paper • 2406.16377 • Published Jun 24 • 11
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions Paper • 2405.20267 • Published May 30 • 1
Zero-Shot Text Classification via Self-Supervised Tuning Paper • 2305.11442 • Published May 19, 2023 • 1
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models Paper • 2306.05179 • Published Jun 8, 2023 • 2
Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations Paper • 2305.10172 • Published May 17, 2023
Multilingual Jailbreak Challenges in Large Language Models Paper • 2310.06474 • Published Oct 10, 2023
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents Paper • 2311.00262 • Published Nov 1, 2023
Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models Paper • 2403.10258 • Published Mar 15
Reasons to Reject? Aligning Language Models with Judgments Paper • 2312.14591 • Published Dec 22, 2023 • 17
SeaLLMs -- Large Language Models for Southeast Asia Paper • 2312.00738 • Published Dec 1, 2023 • 23