AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Paper • 2407.15711 • Published Jul 22, 2024 • 9
Answering Questions by Meta-Reasoning over Multiple Chains of Thought Paper • 2304.13007 • Published Apr 25, 2023 • 1
Making Retrieval-Augmented Language Models Robust to Irrelevant Context Paper • 2310.01558 • Published Oct 2, 2023 • 2
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Paper • 2407.15711 • Published Jul 22, 2024 • 9
SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18, 2024 • 14
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction Paper • 1906.05317 • Published Jun 12, 2019
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception Paper • 2311.09558 • Published Nov 16, 2023
Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models Paper • 2210.13439 • Published Oct 24, 2022
ExpertQA: Expert-Curated Questions and Attributed Answers Paper • 2309.07852 • Published Sep 14, 2023 • 1
SCROLLS: Standardized CompaRison Over Long Language Sequences Paper • 2201.03533 • Published Jan 10, 2022 • 1
QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations Paper • 2305.11694 • Published May 19, 2023 • 1