Revealing Fine-Grained Values and Opinions in Large Language Models Paper • 2406.19238 • Published Jun 27 • 14
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published May 29 • 15
LLMs achieve adult human performance on higher-order theory of mind tasks Paper • 2405.18870 • Published May 29 • 16
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Paper • 2406.04271 • Published Jun 6 • 27
Large Language Model Confidence Estimation via Black-Box Access Paper • 2406.04370 • Published Jun 1 • 19
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7 • 26
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering Paper • 2406.06573 • Published Jun 3 • 8
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6 • 53
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published Jun 13 • 24
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published Jun 19 • 16
Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model Paper • 2406.15275 • Published Jun 21 • 10