Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
update about
Browse files- src/display/about.py +11 -0
src/display/about.py
CHANGED
@@ -54,6 +54,17 @@ Performance leaderboards like the [π€ Open LLM Leaderboard](https://huggingfac
|
|
54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
|
55 |
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
### π€ Open LLM Leaderboard
|
58 |
* a. Can `model` solve `task`?
|
59 |
* b. Metric: absolute accuracy.
|
|
|
54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
|
55 |
|
56 |
|
57 |
+
### π€ Open LLM Leaderboard vs. `/\/` Open CoT Leaderboard
|
58 |
+
* π€: Can `model` solve `task`?
|
59 |
+
`/\/`: Can `model` do CoT to improve in `task`?
|
60 |
+
* π€: Metric: absolute accuracy.
|
61 |
+
`/\/`: Metric: relative accuracy gain.
|
62 |
+
* π€: Measures `task` performance.
|
63 |
+
`/\/`: Measures ability to reason (about `task`).
|
64 |
+
* π€: Covers broad spectrum of `tasks`.
|
65 |
+
`/\/`: Focuses on critical thinking `tasks`.
|
66 |
+
|
67 |
+
|
68 |
### π€ Open LLM Leaderboard
|
69 |
* a. Can `model` solve `task`?
|
70 |
* b. Metric: absolute accuracy.
|