Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1012

Resource: Understanding the new benchmarks

#796

pinned

by rombodawg - opened Jun 26

Discussion

rombodawg

Jun 26

Since we have new benchmarks I've made a nice summarized list of what each one represents. Please read so you can better understand the leaderboard

clefourrier

Open LLM Leaderboard org Jun 27

Thanks a lot! This information is also present in our About page and blog post - we'll gladly update them with the definitions which feel the clearer to the community!

clefourrier changed discussion title from New leaderboard, who dis? to Resource: Understanding the new benchmarks Jun 27

clefourrier pinned discussion Jun 27

AmineHA

Aug 26

•

edited Aug 26

@clefourrier Thank you for your efforts in improving the evaluation suite.

I reviewed the blog post and now understand that normalization is applied to all MC tasks (e.g., MMLU-Pro). To accurately reproduce the results, could you please direct me to the code where this normalization logic is implemented? The documentation outlines the setup and usage of the lm-eval-harness, but it doesn’t detail the normalization process for the scores.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment