openbmb
/

Eurux-8x22b-nca

Text Generation

preference_learning

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lievan commited on Apr 15, 2024

Commit

6ce8ea6

·

verified ·

1 Parent(s): 87aa00d

Update README.md

Files changed (1) hide show

README.md +10 -8

README.md CHANGED Viewed

@@ -43,14 +43,16 @@ It achieves superb reasoning performance as well as exellent chat & instruction-
 ## Evaluation
 We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
-| Model           |   |   Coding  |       |          |   Math  |       |           | Reasoning | Knowledge | Ins-Following |   Chat   |
-|-----------------|---|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
-|                 |   | HumanEval |  MBPP | LeetCode | GSMPLUS |  MATH | TheoremQA | BBH (CoT) |    MMLU   |     IFEval    | MT-Bench |
-| GPT-3.5-Turbo   |   |   76.8    | 82.5  |   23.3   |  61.2   | 37.8  |   35.6    |   70.1    |   70.0    |     56.6      |   7.94   |
-| GPT-4           |   |   85.4    | 83.5  |   41.8   |  85.6   | 69.7  |   52.4    |   86.7    |   86.4    |     79.7      |   8.96   |
-| Eurus-70b-NCA   |   |   79.3    | 71.9  |   33.3   |  62.8   | 41.7  |   32.6    |   80.0    |   59.4    |     49.2      |   7.54   |
-| Eurux-8x22b-KTO |   |   71.3    | 68.9  |   29.4   |  68.3   | 48.4  |   35.3    |   83.6    |   75.9    |     67.1      |   8.58   |
-| Eurux-8x22b-NCA |   |   75.0    | 69.7  |   35.0   |  68.1   | 49.0  |   35.5    |   83.5    |   75.6    |     67.1      |   8.46   |
 ## Usage
 ```python

 ## Evaluation
 We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
+| Models | Tasks  |   Coding  |       |          |   Math  |       |           | Reasoning | Knowledge | Ins-Following |   Chat   |
+|-----------------|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
+| Datasets        | HumanEval |  MBPP | LeetCode | GSMPLUS |  MATH | TheoremQA | BBH (CoT) |    MMLU   |     IFEval    | MT-Bench |
+|-----------------|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
+| GPT-3.5-Turbo   |   76.8    | 82.5  |   23.3   |  61.2   | 37.8  |   35.6    |   70.1    |   70.0    |     56.6      |   7.94   |
+| GPT-4           |   85.4    | 83.5  |   41.8   |  85.6   | 69.7  |   52.4    |   86.7    |   86.4    |     79.7      |   8.96   |
+| Eurus-70b-NCA   |   79.3    | 71.9  |   33.3   |  62.8   | 41.7  |   32.6    |   80.0    |   59.4    |     49.2      |   7.54   |
+| Eurux-8x22b-KTO |   71.3    | 68.9  |   29.4   |  68.3   | 48.4  |   35.3    |   83.6    |   75.9    |     67.1      |   8.58   |
+| Eurux-8x22b-NCA |   75.0    | 69.7  |   35.0   |  68.1   | 49.0  |   35.5    |   83.5    |   75.6    |     67.1      |   8.46   |
 ## Usage
 ```python