Commit
•
11919de
1
Parent(s):
8194e3d
Update README.md
Browse files
README.md
CHANGED
@@ -51,8 +51,6 @@ Notus 7B v1 was trained along November, 2023. And the data as generated by GPT-4
|
|
51 |
|
52 |
## Evaluation
|
53 |
|
54 |
-
Even though LM Eval Harness is a nice benchmark, we have seen that both Alpaca Eval and MT Bench results are usually more meaningful towards explaining how the models will perform in real scenarios and when interacting with humans via chat applications, so the results shown below are just for reporting some metrics and for comparing with existing and similar LLMs.
|
55 |
-
|
56 |
### LM Eval Harness
|
57 |
|
58 |
We ran the evaluation using [`EleutherAI/lm-eval-harness`](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor) from the `big-refactor` branch, aiming to mimic the [Open LLM Leaderboard by HuggingFace H4](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), but running everything on our VMs instead, as we're still experimenting.
|
|
|
51 |
|
52 |
## Evaluation
|
53 |
|
|
|
|
|
54 |
### LM Eval Harness
|
55 |
|
56 |
We ran the evaluation using [`EleutherAI/lm-eval-harness`](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor) from the `big-refactor` branch, aiming to mimic the [Open LLM Leaderboard by HuggingFace H4](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), but running everything on our VMs instead, as we're still experimenting.
|