Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.12241

Papers - Benchmarks - Toxicity

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18 • 10

Papers - Benchmarks - Safety

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18 • 10

Papers - Safety

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30 • 41
Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18 • 10

Papers - Benchmarks

The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20 • 16
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 35
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Paper • 1804.07461 • Published Apr 20, 2018 • 4

Papers - Toxicity

Recourse for reclamation: Chatting with generative language models

Paper • 2403.14467 • Published Mar 21 • 6
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Paper • 2403.15447 • Published Mar 18 • 16
Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18 • 10

CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Paper • 2401.01275 • Published Jan 2 • 1
Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18 • 10
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 116
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18 • 36

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs