|
--- |
|
title: LLM-Perf Leaderboard |
|
emoji: πποΈ |
|
colorFrom: green |
|
colorTo: indigo |
|
sdk: gradio |
|
sdk_version: 5.1.0 |
|
app_file: app.py |
|
pinned: true |
|
license: apache-2.0 |
|
tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard] |
|
--- |
|
|
|
# LLM-perf leaderboard |
|
|
|
## π About |
|
The π€ LLM-Perf Leaderboard ποΈ is a laderboard at the intersection of quality and performance. |
|
Its aim is to benchmark the performance (latency, throughput, memory & energy) |
|
of Large Language Models (LLMs) with different hardwares, backends and optimizations |
|
using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark). |
|
|
|
Anyone from the community can request a new base model or hardware/backend/optimization |
|
configuration for automated benchmarking: |
|
|
|
- Model evaluation requests should be made in the |
|
[π€ Open LLM Leaderboard π
](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ; |
|
we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there. |
|
- Hardware/Backend/Optimization configuration requests should be made in the |
|
[π€ LLM-Perf Leaderboard ποΈ](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or |
|
[Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted). |
|
|
|
## βοΈ Details |
|
|
|
- To avoid communication-dependent results, only one GPU is used. |
|
- Score is the average evaluation score obtained from the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
- LLMs are running on a singleton batch with a prompt size of 256 and generating a 64 tokens for at least 10 iterations and 10 seconds. |
|
- Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine. |
|
- We measure three types of memory: Max Allocated Memory, Max Reserved Memory and Max Used Memory. The first two being reported by PyTorch and the last one being observed using PyNVML. |
|
|
|
All of our benchmarks are ran by this single script |
|
[benchmark_cuda_pytorch.py](https://github.com/huggingface/optimum-benchmark/blob/llm-perf/llm-perf/benchmark_cuda_pytorch.py) |
|
using the power of [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) to garantee reproducibility and consistency. |
|
|
|
## π How to run locally |
|
|
|
To run the LLM-Perf Leaderboard locally on your machine, follow these steps: |
|
|
|
### 1. Clone the Repository |
|
|
|
First, clone the repository to your local machine: |
|
|
|
```bash |
|
git clone https://huggingface.co/spaces/optimum/llm-perf-leaderboard |
|
cd llm-perf-leaderboard |
|
``` |
|
|
|
### 2. Install the Required Dependencies |
|
|
|
Install the necessary Python packages listed in the requirements.txt file: |
|
`pip install -r requirements.txt` |
|
|
|
### 3. Run the Application |
|
|
|
You can run the Gradio application in one of the following ways: |
|
- Option 1: Using Python |
|
`python app.py` |
|
- Option 2: Using Gradio CLI (include hot-reload) |
|
`gradio app.py` |
|
|
|
### 4. Access the Application |
|
|
|
Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/ |