Commit
•
d07c7f2
1
Parent(s):
faff0de
Adding Evaluation Results (#1)
Browse files- Adding Evaluation Results (5ed5b7a75d353b37c602ff44a7a465255677586c)
Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>
README.md
CHANGED
@@ -40,4 +40,17 @@ Or, alternatively, change `model_type` in `config.json` from `mistral` to `llama
|
|
40 |
>
|
41 |
>In summary, these five technologies have deeply altered our lives and created new possibilities, yet they come with tradeoffs and ethical dilemmas. Continued innovation and collaboration between industry, academia and governments will be needed to ensure these technologies fulfill their promises while minimizing unforeseen consequences.
|
42 |
>
|
43 |
-
>I hope this provides some helpful insights! Let me know if you would like me to expand on any specific points.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
>
|
41 |
>In summary, these five technologies have deeply altered our lives and created new possibilities, yet they come with tradeoffs and ethical dilemmas. Continued innovation and collaboration between industry, academia and governments will be needed to ensure these technologies fulfill their promises while minimizing unforeseen consequences.
|
42 |
>
|
43 |
+
>I hope this provides some helpful insights! Let me know if you would like me to expand on any specific points.
|
44 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
45 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Norquinal__Mistral-7B-claude-instruct)
|
46 |
+
|
47 |
+
| Metric | Value |
|
48 |
+
|-----------------------|---------------------------|
|
49 |
+
| Avg. | 51.71 |
|
50 |
+
| ARC (25-shot) | 63.23 |
|
51 |
+
| HellaSwag (10-shot) | 84.99 |
|
52 |
+
| MMLU (5-shot) | 63.84 |
|
53 |
+
| TruthfulQA (0-shot) | 47.47 |
|
54 |
+
| Winogrande (5-shot) | 78.14 |
|
55 |
+
| GSM8K (5-shot) | 17.97 |
|
56 |
+
| DROP (3-shot) | 6.35 |
|