MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published 27 days ago • 8
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31, 2024 • 76
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 11
QuALITY: Question Answering with Long Input Texts, Yes! Paper • 2112.08608 • Published Dec 16, 2021 • 2
Does Putting a Linguist in the Loop Improve NLU Data Collection? Paper • 2104.07179 • Published Apr 15, 2021 • 1
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Paper • 2206.04615 • Published Jun 9, 2022 • 5
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 44
DMLR: Data-centric Machine Learning Research -- Past, Present and Future Paper • 2311.13028 • Published Nov 21, 2023 • 1
BLiMP: The Benchmark of Linguistic Minimal Pairs for English Paper • 1912.00582 • Published Dec 2, 2019
A Framework to Assess (Dis)agreement Among Diverse Rater Groups Paper • 2311.05074 • Published Nov 9, 2023