survivi commited on
Commit
a9c3cae
โ€ข
1 Parent(s): 961cd9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -27
README.md CHANGED
@@ -2,36 +2,67 @@
2
  language:
3
  - en
4
  - zh
 
 
 
 
5
  ---
6
 
7
- # Llama-3-SynE
 
 
 
 
 
 
8
 
9
  <p align="center">
10
- ๐Ÿ“„<a href="https://arxiv.org/abs/2407.18743" target="_blank"> Report </a> โ€ข ๐Ÿ’ป <a href="https://github.com/RUC-GSAI/Llama-3-SynE" target="_blank">GitHub Repo</a>
 
 
 
 
11
  </p>
12
 
13
  <p align="center">
14
- ๐Ÿ”<a href="https://huggingface.co/survivi/Llama-3-SynE/blob/main/README_zh.md" target="_blank">ไธญๆ–‡</a>
 
 
 
 
 
 
15
  </p>
16
 
 
 
 
 
 
 
 
 
17
  ## News
18
- - โœจโœจ ``2024/08/10``: We released the [Llama-3-SynE model](https://huggingface.co/survivi/Llama-3-SynE).
19
- - โœจ ``2024/07/26``: We released the [technical report](https://arxiv.org/abs/2407.18743), welcome to check it out!
 
 
20
 
21
  ## Model Introduction
22
 
23
  **Llama-3-SynE** (<ins>Syn</ins>thetic data <ins>E</ins>nhanced Llama-3) is a significantly enhanced version of [Llama-3 (8B)](https://github.com/meta-llama/llama3), achieved through continual pre-training (CPT) to improve its **Chinese language ability and scientific reasoning capability**. By employing a meticulously designed data mixture and curriculum strategy, Llama-3-SynE successfully enhances new abilities while maintaining the original modelโ€™s performance. This enhancement process involves utilizing existing datasets and synthesizing high-quality datasets specifically designed for targeted tasks.
24
 
25
  Key features of Llama-3-SynE include:
 
26
  - **Enhanced Chinese Language Capabilities**: Achieved through topic-based data mixture and perplexity-based data curriculum.
27
  - **Improved Scientific Reasoning**: Utilizing synthetic datasets to enhance multi-disciplinary scientific knowledge.
28
  - **Efficient CPT**: Only consuming around 100 billion tokens, making it a cost-effective solution.
29
 
30
  ## Model List
31
 
32
- | Model | Type | Seq Length | Download |
33
- |:-----------------|:-------|:------------|:----------------------------------------------------------------|
34
- | Llama-3-SynE | Base | 8K | [๐Ÿค— Huggingface](https://huggingface.co/survivi/Llama-3-SynE) |
35
 
36
  ## BenchMark
37
 
@@ -44,15 +75,15 @@ For HumanEval and ARC, we report the zero-shot evaluation performance. The best
44
 
45
  ### Major Benchmarks
46
 
47
- | **Models** | **MMLU** | **C-Eval** | **CMMLU** | **MATH** | **GSM8K** | **ASDiv** | **MAWPS** | **SAT-Math** | **HumanEval** | **MBPP** |
48
- |:---------------------------|:---------------|:----------|:---------|:---------------|:---------|:---------|:---------|:-----------|:----------------|:--------|
49
- | Llama-3-8B | **66.60** | 49.43 | 51.03 | 16.20 | 54.40 | 72.10 | 89.30 | 38.64 | <ins>36.59</ins> | **47.00** |
50
- | DCLM-7B | 64.01 | 41.24 | 40.89 | 14.10 | 39.20 | 67.10 | 83.40 | <ins>41.36</ins> | 21.95 | 32.60 |
51
- | Mistral-7B-v0.3 | 63.54 | 42.74 | 43.72 | 12.30 | 40.50 | 67.50 | 87.50 | 40.45 | 25.61 | 36.00 |
52
- | Llama-3-Chinese-8B | 64.10 | <ins>50.14</ins> | <ins>51.20</ins> | 3.60 | 0.80 | 1.90 | 0.60 | 36.82 | 9.76 | 14.80 |
53
- | MAmmoTH2-8B | 64.89 | 46.56 | 45.90 | **34.10** | **61.70**| **82.80**| <ins>91.50</ins> | <ins>41.36</ins> | 17.68 | 38.80 |
54
- | Galactica-6.7B | 37.13 | 26.72 | 25.53 | 5.30 | 9.60 | 40.90 | 51.70 | 23.18 | 7.31 | 2.00 |
55
- | **Llama-3-SynE (ours)** | <ins>65.19</ins> | **58.24**| **57.34**| <ins>28.20</ins> | <ins>60.80</ins> | <ins>81.00</ins> | **94.10**| **43.64**| **42.07** | <ins>45.60</ins>|
56
 
57
  > On **Chinese evaluation benchmarks** (such as C-Eval and CMMLU), Llama-3-SynE significantly outperforms the base model Llama-3 (8B), indicating that our method is very effective in improving Chinese language capabilities.
58
 
@@ -62,15 +93,15 @@ For HumanEval and ARC, we report the zero-shot evaluation performance. The best
62
 
63
  "PHY", "CHE", and "BIO" denote the physics, chemistry, and biology sub-tasks of the corresponding benchmarks.
64
 
65
- | **Models** | **SciEval PHY** | **SciEval CHE** | **SciEval BIO** | **SciEval Avg.** | **SciQ** | **GaoKao MathQA** | **GaoKao CHE** | **GaoKao BIO** | **ARC Easy** | **ARC Challenge** | **ARC Avg.** | **AQUA-RAT** |
66
- |:--------------------|:-----------------|:-----------------|:-----------------|:------------------|:---------------|:-------------------|:----------------|:----------------|:---------------|:-------------------|:--------------|:-------------------|
67
- | Llama-3-8B | 46.95 | 63.45 | 74.53 | 65.47 | 90.90 | 27.92 | 32.85 | 43.81 | 91.37 | 77.73 | 84.51 | <ins>27.95</ins> |
68
- | DCLM-7B | **56.71** | 64.39 | 72.03 | 66.25 | **92.50** | 29.06 | 31.40 | 37.14 | 89.52 | 76.37 | 82.94 | 20.08 |
69
- | Mistral-7B-v0.3 | 48.17 | 59.41 | 68.89 | 61.51 | 89.40 | 30.48 | 30.92 | 41.43 | 87.33 | 74.74 | 81.04 | 23.23 |
70
- | Llama-3-Chinese-8B | 48.17 | 67.34 | 73.90 | <ins>67.34</ins> | 89.20 | 27.64 | 30.43 | 38.57 | 88.22 | 70.48 | 79.35 | 27.56 |
71
- | MAmmoTH2-8B | 49.39 | **69.36** | <ins>76.83</ins> | **69.60** | 90.20 | **32.19** | <ins>36.23</ins> | <ins>49.05</ins> | **92.85** | **84.30** | **88.57** | 27.17 |
72
- | Galactica-6.7B | 34.76 | 43.39 | 54.07 | 46.27 | 71.50 | 23.65 | 27.05 | 24.76 | 65.91 | 46.76 | 56.33 | 20.87 |
73
- | **Llama-3-SynE (ours)** | <ins>53.66</ins> | <ins>67.81</ins> | **77.45** | **69.60** | <ins>91.20</ins> | <ins>31.05</ins> | **51.21** | **69.52** | <ins>91.58</ins> | <ins>80.97</ins> | <ins>86.28</ins> | **28.74** |
74
 
75
  > On **scientific evaluation benchmarks** (such as SciEval, GaoKao, and ARC), Llama-3-SynE significantly outperforms the base model, particularly showing remarkable improvement in Chinese scientific benchmarks (for example, a 25.71% improvement in the GaoKao biology subtest).
76
 
@@ -134,7 +165,7 @@ print(output)
134
 
135
  ## License
136
 
137
- This project is built upon Meta's Llama-3 model. The use of Llama-3-SynE model weights must follow the Llama-3 [license agreement](https://github.com/meta-llama/llama3/blob/main/LICENSE).
138
 
139
  ## Citation
140
 
 
2
  language:
3
  - en
4
  - zh
5
+ datasets:
6
+ - survivi/Llama-3-SynE-Dataset
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
  ---
10
 
11
+ <p align="center">
12
+ <img src="https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/assets/llama-3-syne-logo.png" width="400"/>
13
+ </p>
14
+
15
+ <!-- <p align="center">
16
+ ๐Ÿ“„ <a href="https://arxiv.org/abs/2407.18743"> Report </a>&nbsp | &nbsp ๐Ÿค— <a href="https://huggingface.co/survivi/Llama-3-SynE">Model on Hugging Face</a>&nbsp | &nbsp ๐Ÿ“Š <a href="https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset">CPT Dataset</a>
17
+ </p>
18
 
19
  <p align="center">
20
+ ๐Ÿ” <a href="https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/README.md">English</a>&nbsp | &nbsp<a href="https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/README_zh.md">็ฎ€ไฝ“ไธญๆ–‡</a>
21
+ </p> -->
22
+
23
+ <p align="center">
24
+ ๐Ÿ“„ <a href="https://arxiv.org/abs/2407.18743"> Report </a>&nbsp | &nbsp ๐Ÿ’ป <a href="https://github.com/RUC-GSAI/Llama-3-SynE">GitHub Repo</a>
25
  </p>
26
 
27
  <p align="center">
28
+ ๐Ÿ” <a href="https://huggingface.co/survivi/Llama-3-SynE/blob/main/README.md">English</a>&nbsp | &nbsp<a href="https://huggingface.co/survivi/Llama-3-SynE/blob/main/README_zh.md">็ฎ€ไฝ“ไธญๆ–‡</a>
29
+ </p>
30
+
31
+ > Here is the Llama-3-SynE model. The continual pre-training dataset is also available [here](https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset).
32
+
33
+ <!-- <p align="center">
34
+ ๐Ÿ“„ <a href="https://arxiv.org/abs/2407.18743"> Report </a>&nbsp | &nbsp ๐Ÿ’ป <a href="https://github.com/RUC-GSAI/Llama-3-SynE">GitHub Repo</a>
35
  </p>
36
 
37
+ <p align="center">
38
+ ๐Ÿ” <a href="https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset/blob/main/README.md">English</a>&nbsp | &nbsp<a href="https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset/blob/main/README_zh.md">็ฎ€ไฝ“ไธญๆ–‡</a>
39
+ </p>
40
+
41
+ > Here is the continual pre-training dataset. The Llama-3-SynE model is available [here](https://huggingface.co/survivi/Llama-3-SynE). -->
42
+
43
+ ---
44
+
45
  ## News
46
+
47
+ - โœจโœจ `2024/08/12`: We released the [continual pre-training dataset](https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset).
48
+ - โœจโœจ `2024/08/10`: We released the [Llama-3-SynE model](https://huggingface.co/survivi/Llama-3-SynE).
49
+ - โœจ `2024/07/26`: We released the [technical report](https://arxiv.org/abs/2407.18743), welcome to check it out!
50
 
51
  ## Model Introduction
52
 
53
  **Llama-3-SynE** (<ins>Syn</ins>thetic data <ins>E</ins>nhanced Llama-3) is a significantly enhanced version of [Llama-3 (8B)](https://github.com/meta-llama/llama3), achieved through continual pre-training (CPT) to improve its **Chinese language ability and scientific reasoning capability**. By employing a meticulously designed data mixture and curriculum strategy, Llama-3-SynE successfully enhances new abilities while maintaining the original modelโ€™s performance. This enhancement process involves utilizing existing datasets and synthesizing high-quality datasets specifically designed for targeted tasks.
54
 
55
  Key features of Llama-3-SynE include:
56
+
57
  - **Enhanced Chinese Language Capabilities**: Achieved through topic-based data mixture and perplexity-based data curriculum.
58
  - **Improved Scientific Reasoning**: Utilizing synthetic datasets to enhance multi-disciplinary scientific knowledge.
59
  - **Efficient CPT**: Only consuming around 100 billion tokens, making it a cost-effective solution.
60
 
61
  ## Model List
62
 
63
+ | Model | Type | Seq Length | Download |
64
+ | :----------- | :--- | :--------- | :------------------------------------------------------------ |
65
+ | Llama-3-SynE | Base | 8K | [๐Ÿค— Huggingface](https://huggingface.co/survivi/Llama-3-SynE) |
66
 
67
  ## BenchMark
68
 
 
75
 
76
  ### Major Benchmarks
77
 
78
+ | **Models** | **MMLU** | **C-Eval** | **CMMLU** | **MATH** | **GSM8K** | **ASDiv** | **MAWPS** | **SAT-Math** | **HumanEval** | **MBPP** |
79
+ | :---------------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- |
80
+ | Llama-3-8B | **66.60** | 49.43 | 51.03 | 16.20 | 54.40 | 72.10 | 89.30 | 38.64 | <ins>36.59</ins> | **47.00** |
81
+ | DCLM-7B | 64.01 | 41.24 | 40.89 | 14.10 | 39.20 | 67.10 | 83.40 | <ins>41.36</ins> | 21.95 | 32.60 |
82
+ | Mistral-7B-v0.3 | 63.54 | 42.74 | 43.72 | 12.30 | 40.50 | 67.50 | 87.50 | 40.45 | 25.61 | 36.00 |
83
+ | Llama-3-Chinese-8B | 64.10 | <ins>50.14</ins> | <ins>51.20</ins> | 3.60 | 0.80 | 1.90 | 0.60 | 36.82 | 9.76 | 14.80 |
84
+ | MAmmoTH2-8B | 64.89 | 46.56 | 45.90 | **34.10** | **61.70** | **82.80** | <ins>91.50</ins> | <ins>41.36</ins> | 17.68 | 38.80 |
85
+ | Galactica-6.7B | 37.13 | 26.72 | 25.53 | 5.30 | 9.60 | 40.90 | 51.70 | 23.18 | 7.31 | 2.00 |
86
+ | **Llama-3-SynE (ours)** | <ins>65.19</ins> | **58.24** | **57.34** | <ins>28.20</ins> | <ins>60.80</ins> | <ins>81.00</ins> | **94.10** | **43.64** | **42.07** | <ins>45.60</ins> |
87
 
88
  > On **Chinese evaluation benchmarks** (such as C-Eval and CMMLU), Llama-3-SynE significantly outperforms the base model Llama-3 (8B), indicating that our method is very effective in improving Chinese language capabilities.
89
 
 
93
 
94
  "PHY", "CHE", and "BIO" denote the physics, chemistry, and biology sub-tasks of the corresponding benchmarks.
95
 
96
+ | **Models** | **SciEval PHY** | **SciEval CHE** | **SciEval BIO** | **SciEval Avg.** | **SciQ** | **GaoKao MathQA** | **GaoKao CHE** | **GaoKao BIO** | **ARC Easy** | **ARC Challenge** | **ARC Avg.** | **AQUA-RAT** |
97
+ | :---------------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :---------------- | :--------------- | :--------------- | :--------------- | :---------------- | :--------------- | :--------------- |
98
+ | Llama-3-8B | 46.95 | 63.45 | 74.53 | 65.47 | 90.90 | 27.92 | 32.85 | 43.81 | 91.37 | 77.73 | 84.51 | <ins>27.95</ins> |
99
+ | DCLM-7B | **56.71** | 64.39 | 72.03 | 66.25 | **92.50** | 29.06 | 31.40 | 37.14 | 89.52 | 76.37 | 82.94 | 20.08 |
100
+ | Mistral-7B-v0.3 | 48.17 | 59.41 | 68.89 | 61.51 | 89.40 | 30.48 | 30.92 | 41.43 | 87.33 | 74.74 | 81.04 | 23.23 |
101
+ | Llama-3-Chinese-8B | 48.17 | 67.34 | 73.90 | <ins>67.34</ins> | 89.20 | 27.64 | 30.43 | 38.57 | 88.22 | 70.48 | 79.35 | 27.56 |
102
+ | MAmmoTH2-8B | 49.39 | **69.36** | <ins>76.83</ins> | **69.60** | 90.20 | **32.19** | <ins>36.23</ins> | <ins>49.05</ins> | **92.85** | **84.30** | **88.57** | 27.17 |
103
+ | Galactica-6.7B | 34.76 | 43.39 | 54.07 | 46.27 | 71.50 | 23.65 | 27.05 | 24.76 | 65.91 | 46.76 | 56.33 | 20.87 |
104
+ | **Llama-3-SynE (ours)** | <ins>53.66</ins> | <ins>67.81</ins> | **77.45** | **69.60** | <ins>91.20</ins> | <ins>31.05</ins> | **51.21** | **69.52** | <ins>91.58</ins> | <ins>80.97</ins> | <ins>86.28</ins> | **28.74** |
105
 
106
  > On **scientific evaluation benchmarks** (such as SciEval, GaoKao, and ARC), Llama-3-SynE significantly outperforms the base model, particularly showing remarkable improvement in Chinese scientific benchmarks (for example, a 25.71% improvement in the GaoKao biology subtest).
107
 
 
165
 
166
  ## License
167
 
168
+ This project is built upon Meta's Llama-3 model. The use of Llama-3-SynE model weights must follow the Llama-3 [license agreement](https://github.com/meta-llama/llama3/blob/main/LICENSE). The code in this open-source repository follows the [Apache 2.0](LICENSE) license.
169
 
170
  ## Citation
171