Text Generation
Transformers
Safetensors
mixtral
reasoning
preference_learning
nca
conversational
text-generation-inference
Inference Endpoints
lievan commited on
Commit
2734286
·
verified ·
1 Parent(s): 661d7f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -43,9 +43,9 @@ It achieves superb reasoning performance as well as exellent chat & instruction-
43
  ## Evaluation
44
  We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
45
 
46
- | Models&Tasks | Coding | | | Math | | | Reasoning | Knowledge | Ins-Following | Chat |
47
  |-----------------|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
48
- | Datasets | HumanEval | MBPP | LeetCode | GSMPLUS | MATH | TheoremQA | BBH (CoT) | MMLU | IFEval | MT-Bench |
49
  | GPT-3.5-Turbo | 76.8 | 82.5 | 23.3 | 61.2 | 37.8 | 35.6 | 70.1 | 70.0 | 56.6 | 7.94 |
50
  | GPT-4 | 85.4 | 83.5 | 41.8 | 85.6 | 69.7 | 52.4 | 86.7 | 86.4 | 79.7 | 8.96 |
51
  | Eurus-70b-NCA | 79.3 | 71.9 | 33.3 | 62.8 | 41.7 | 32.6 | 80.0 | 59.4 | 49.2 | 7.54 |
 
43
  ## Evaluation
44
  We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
45
 
46
+ | Models/Benchmarks| Coding | | | Math | | | Reasoning | Knowledge | Ins-Following | Chat |
47
  |-----------------|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
48
+ | | HumanEval | MBPP | LeetCode | GSMPLUS | MATH | TheoremQA | BBH (CoT) | MMLU | IFEval | MT-Bench |
49
  | GPT-3.5-Turbo | 76.8 | 82.5 | 23.3 | 61.2 | 37.8 | 35.6 | 70.1 | 70.0 | 56.6 | 7.94 |
50
  | GPT-4 | 85.4 | 83.5 | 41.8 | 85.6 | 69.7 | 52.4 | 86.7 | 86.4 | 79.7 | 8.96 |
51
  | Eurus-70b-NCA | 79.3 | 71.9 | 33.3 | 62.8 | 41.7 | 32.6 | 80.0 | 59.4 | 49.2 | 7.54 |