Update README.md
Browse files
README.md
CHANGED
@@ -54,10 +54,10 @@ We relied on the popular MTBench benchmark to evaluate multi-turn performance.
|
|
54 |
|
55 |
Since MTBench is an English only benchmark, we also release this fork of [MTBench Finnish](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) with multilingual support and machine translated Finnish prompts. Our scores for both benchmarks follow.
|
56 |
|
57 |
-
| Eval |
|
58 |
-
|
|
59 |
-
| MTBench
|
60 |
-
| MTBench Finnish | 5.
|
61 |
|
62 |
|
63 |
## License
|
|
|
54 |
|
55 |
Since MTBench is an English only benchmark, we also release this fork of [MTBench Finnish](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) with multilingual support and machine translated Finnish prompts. Our scores for both benchmarks follow.
|
56 |
|
57 |
+
| Eval | Overall | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing |
|
58 |
+
| :---- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ----: |
|
59 |
+
| MTBench English | 6.16 | 3.65 | 6.55 | 9.6 | 2.25 | 4.25 | 7.25 | 7.42 | 8.37 |
|
60 |
+
| MTBench Finnish | 5.73 | 3.05 | 6.05 | 9.6 | 1.25 | 3.65 | 7.0 | 7.65 | 7.6 |
|
61 |
|
62 |
|
63 |
## License
|