Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ It achieves superb reasoning performance as well as exellent chat & instruction-
|
|
43 |
## Evaluation
|
44 |
We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
|
45 |
|
46 |
-
| Models
|
47 |
|-----------------|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
|
48 |
| Datasets | HumanEval | MBPP | LeetCode | GSMPLUS | MATH | TheoremQA | BBH (CoT) | MMLU | IFEval | MT-Bench |
|
49 |
| GPT-3.5-Turbo | 76.8 | 82.5 | 23.3 | 61.2 | 37.8 | 35.6 | 70.1 | 70.0 | 56.6 | 7.94 |
|
|
|
43 |
## Evaluation
|
44 |
We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
|
45 |
|
46 |
+
| Models&Tasks | Coding | | | Math | | | Reasoning | Knowledge | Ins-Following | Chat |
|
47 |
|-----------------|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
|
48 |
| Datasets | HumanEval | MBPP | LeetCode | GSMPLUS | MATH | TheoremQA | BBH (CoT) | MMLU | IFEval | MT-Bench |
|
49 |
| GPT-3.5-Turbo | 76.8 | 82.5 | 23.3 | 61.2 | 37.8 | 35.6 | 70.1 | 70.0 | 56.6 | 7.94 |
|