Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Resource: Understanding the new benchmarks
#796
pinned
by
rombodawg
- opened
Thanks a lot! This information is also present in our About page and blog post - we'll gladly update them with the definitions which feel the clearer to the community!
clefourrier
changed discussion title from
New leaderboard, who dis?
to Resource: Understanding the new benchmarks
clefourrier
pinned discussion
@clefourrier Thank you for your efforts in improving the evaluation suite.
I reviewed the blog post and now understand that normalization is applied to all MC tasks (e.g., MMLU-Pro). To accurately reproduce the results, could you please direct me to the code where this normalization logic is implemented? The documentation outlines the setup and usage of the lm-eval-harness, but it doesn’t detail the normalization process for the scores.