Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published 20 days ago • 43
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models Paper • 2410.17578 • Published Oct 23 • 1
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21 • 43
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20 • 12
Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education Paper • 2407.17022 • Published Jul 24
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9 • 2
Aligning to Thousands of Preferences via System Message Generalization Paper • 2405.17977 • Published May 28 • 6
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 119
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards Paper • 2404.10346 • Published Apr 16 • 1
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper • 2404.02575 • Published Apr 3 • 48
KMMLU: Measuring Massive Multitask Language Understanding in Korean Paper • 2402.11548 • Published Feb 18
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once? Paper • 2402.11597 • Published Feb 18 • 1
LangBridge: Multilingual Reasoning Without Multilingual Supervision Paper • 2401.10695 • Published Jan 19 • 5
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation Paper • 2401.06591 • Published Jan 12 • 3
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 Paper • 2311.10702 • Published Nov 17, 2023 • 18
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging Paper • 2310.11564 • Published Oct 17, 2023 • 2
CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification Paper • 2303.03628 • Published Mar 7, 2023 • 2
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning Paper • 2305.14045 • Published May 23, 2023 • 5
Exploring the Benefits of Training Expert Language Models over Instruction Tuning Paper • 2302.03202 • Published Feb 7, 2023 • 1