|
--- |
|
title: README |
|
emoji: ⚖️ |
|
colorFrom: blue |
|
colorTo: indigo |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
Hi! Welcome on the org page of the Evaluation team at HuggingFace. |
|
We want to support the community in building and sharing quality evaluations, for reproducible and fair model comparisions, to cut through the hype of releases and better understand actual model capabilities. |
|
|
|
We're behind the: |
|
- [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) (over 11K models evaluated since 2023) |
|
- [lighteval](https://github.com/huggingface/lighteval) LLM evaluation suite, fast and filled with the SOTA benchmarks you might want |
|
- [evaluation guidebook](https://github.com/huggingface/evaluation-guidebook), your reference for LLM evals |
|
- [leaderboards on the hub](https://huggingface.co/blog?tag=leaderboard) initiative, to encourage people to build more leaderboards in the open for more reproducible evaluation. You'll find some doc [here](https://huggingface.co/docs/leaderboards/index) to build your own, and you can look for the best leaderboard for your use case [here](https://huggingface.co/spaces/OpenEvals/find-a-leaderboard)! |
|
|
|
We're not behind the [evaluate metrics guide](https://huggingface.co/evaluate-metric) but if you want to understand metrics better we really recommend checking it out! |