hallucinations-leaderboard

community

https://www.neuralnoise.com

pminervini

pminervini

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

aryopg authored a paper 12 days ago

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs

pminervini authored a paper 15 days ago

FLARE: Faithful Logic-Aided Reasoning and Exploration

pminervini authored a paper 15 days ago

SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages

View all activity

hallucinations-leaderboard's activity

aryopg

authored a paper 12 days ago

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs

Paper • 2502.05092 • Published 18 days ago • 7

pminervini

authored 6 papers 15 days ago

FLARE: Faithful Logic-Aided Reasoning and Exploration

Paper • 2410.11900 • Published Oct 14, 2024 • 4

SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages

Paper • 2406.14425 • Published Jun 20, 2024 • 2

Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Paper • 2410.16090 • Published Oct 21, 2024 • 7

Mixtures of In-Context Learners

Paper • 2411.02830 • Published Nov 5, 2024 • 2

Aligning Generalisation Between Humans and Machines

Paper • 2411.15626 • Published Nov 23, 2024

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs

Paper • 2502.05092 • Published 18 days ago • 7

clefourrier

authored a paper 19 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 21 days ago • 192

pingnieuk

authored a paper 20 days ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published 22 days ago • 28

clefourrier

authored a paper 3 months ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 18

acDante

authored 4 papers 3 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8, 2024 • 8

Are We Done with MMLU?

Paper • 2406.04127 • Published Jun 6, 2024 • 38

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

Paper • 2410.15999 • Published Oct 21, 2024 • 19

Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Paper • 2410.16090 • Published Oct 21, 2024 • 7

pminervini

updated 2 datasets 4 months ago

hallucinations-leaderboard/requests

Preview • Updated Oct 31, 2024 • 270k

hallucinations-leaderboard/results

Updated Oct 31, 2024 • 475k • 2

pminervini

authored 2 papers 4 months ago

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

Paper • 2410.15999 • Published Oct 21, 2024 • 19

Adapting Neural Link Predictors for Data-Efficient Complex Query Answering

Paper • 2301.12313 • Published Jan 29, 2023 • 1

aryopg

authored 2 papers 4 months ago

CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning

Paper • 2410.10336 • Published Oct 14, 2024 • 2

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

Paper • 2410.15999 • Published Oct 21, 2024 • 19