Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ Key features of Llama-3-SynE include:
|
|
30 |
## Model List
|
31 |
|
32 |
| Model | Type | Seq Length | Download |
|
33 |
-
|
34 |
| Llama-3-SynE | Base | 8K | [🤗 Huggingface](https://huggingface.co/survivi/Llama-3-SynE) |
|
35 |
|
36 |
## BenchMark
|
@@ -45,7 +45,7 @@ For HumanEval and ARC, we report the zero-shot evaluation performance. The best
|
|
45 |
### Major Benchmarks
|
46 |
|
47 |
| **Models** | **MMLU** | **C-Eval** | **CMMLU** | **MATH** | **GSM8K** | **ASDiv** | **MAWPS** | **SAT-Math** | **HumanEval** | **MBPP** |
|
48 |
-
|
49 |
| Llama-3-8B | **66.60** | 49.43 | 51.03 | 16.20 | 54.40 | 72.10 | 89.30 | 38.64 | <ins>36.59</ins> | **47.00** |
|
50 |
| DCLM-7B | 64.01 | 41.24 | 40.89 | 14.10 | 39.20 | 67.10 | 83.40 | <ins>41.36</ins> | 21.95 | 32.60 |
|
51 |
| Mistral-7B-v0.3 | 63.54 | 42.74 | 43.72 | 12.30 | 40.50 | 67.50 | 87.50 | 40.45 | 25.61 | 36.00 |
|
@@ -63,7 +63,7 @@ For HumanEval and ARC, we report the zero-shot evaluation performance. The best
|
|
63 |
"PHY", "CHE", and "BIO" denote the physics, chemistry, and biology sub-tasks of the corresponding benchmarks.
|
64 |
|
65 |
| **Models** | **SciEval PHY** | **SciEval CHE** | **SciEval BIO** | **SciEval Avg.** | **SciQ** | **GaoKao MathQA** | **GaoKao CHE** | **GaoKao BIO** | **ARC Easy** | **ARC Challenge** | **ARC Avg.** | **AQUA-RAT** |
|
66 |
-
|
67 |
| Llama-3-8B | 46.95 | 63.45 | 74.53 | 65.47 | 90.90 | 27.92 | 32.85 | 43.81 | 91.37 | 77.73 | 84.51 | <ins>27.95</ins> |
|
68 |
| DCLM-7B | **56.71** | 64.39 | 72.03 | 66.25 | **92.50** | 29.06 | 31.40 | 37.14 | 89.52 | 76.37 | 82.94 | 20.08 |
|
69 |
| Mistral-7B-v0.3 | 48.17 | 59.41 | 68.89 | 61.51 | 89.40 | 30.48 | 30.92 | 41.43 | 87.33 | 74.74 | 81.04 | 23.23 |
|
|
|
30 |
## Model List
|
31 |
|
32 |
| Model | Type | Seq Length | Download |
|
33 |
+
|:-----------------|:-------|:------------|:----------------------------------------------------------------|
|
34 |
| Llama-3-SynE | Base | 8K | [🤗 Huggingface](https://huggingface.co/survivi/Llama-3-SynE) |
|
35 |
|
36 |
## BenchMark
|
|
|
45 |
### Major Benchmarks
|
46 |
|
47 |
| **Models** | **MMLU** | **C-Eval** | **CMMLU** | **MATH** | **GSM8K** | **ASDiv** | **MAWPS** | **SAT-Math** | **HumanEval** | **MBPP** |
|
48 |
+
|:---------------------------|:---------------|:----------|:---------|:---------------|:---------|:---------|:---------|:-----------|:----------------|:--------|
|
49 |
| Llama-3-8B | **66.60** | 49.43 | 51.03 | 16.20 | 54.40 | 72.10 | 89.30 | 38.64 | <ins>36.59</ins> | **47.00** |
|
50 |
| DCLM-7B | 64.01 | 41.24 | 40.89 | 14.10 | 39.20 | 67.10 | 83.40 | <ins>41.36</ins> | 21.95 | 32.60 |
|
51 |
| Mistral-7B-v0.3 | 63.54 | 42.74 | 43.72 | 12.30 | 40.50 | 67.50 | 87.50 | 40.45 | 25.61 | 36.00 |
|
|
|
63 |
"PHY", "CHE", and "BIO" denote the physics, chemistry, and biology sub-tasks of the corresponding benchmarks.
|
64 |
|
65 |
| **Models** | **SciEval PHY** | **SciEval CHE** | **SciEval BIO** | **SciEval Avg.** | **SciQ** | **GaoKao MathQA** | **GaoKao CHE** | **GaoKao BIO** | **ARC Easy** | **ARC Challenge** | **ARC Avg.** | **AQUA-RAT** |
|
66 |
+
|:--------------------|:-----------------|:-----------------|:-----------------|:------------------|:---------------|:-------------------|:----------------|:----------------|:---------------|:-------------------|:--------------|:-------------------|
|
67 |
| Llama-3-8B | 46.95 | 63.45 | 74.53 | 65.47 | 90.90 | 27.92 | 32.85 | 43.81 | 91.37 | 77.73 | 84.51 | <ins>27.95</ins> |
|
68 |
| DCLM-7B | **56.71** | 64.39 | 72.03 | 66.25 | **92.50** | 29.06 | 31.40 | 37.14 | 89.52 | 76.37 | 82.94 | 20.08 |
|
69 |
| Mistral-7B-v0.3 | 48.17 | 59.41 | 68.89 | 61.51 | 89.40 | 30.48 | 30.92 | 41.43 | 87.33 | 74.74 | 81.04 | 23.23 |
|