JosephusCheung
commited on
Commit
·
dd009b6
1
Parent(s):
0605834
Update README.md
Browse files
README.md
CHANGED
@@ -88,6 +88,13 @@ Hard ACC:54.71
|
|
88 |
|
89 |
**Zero-shot ACC 0.7012888551933283** (Outperforms MetaMath-13B, Qwen-14B)
|
90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
**GPT2Tokenizer 上的 llama.cpp 存在一些问题,会尽快修复...**
|
93 |
|
@@ -137,4 +144,11 @@ STEM准确率:66.71
|
|
137 |
|
138 |
## GSM8K
|
139 |
|
140 |
-
**零样本准确率0.7012888551933283**(超过MetaMath-13B和Qwen-14B)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
**Zero-shot ACC 0.7012888551933283** (Outperforms MetaMath-13B, Qwen-14B)
|
90 |
|
91 |
+
## AlpacaEval Leaderboard
|
92 |
+
| | win_rate | standard_error | n_wins | n_wins_base | n_draws | n_total | mode | avg_length |
|
93 |
+
| ------------ | -------- | -------------- | ------ | ----------- | ------- | ------- | --------- | ---------- |
|
94 |
+
| causallm-14b | **88.26087** | 1.116333 | 705 | 89 | 11 | 805 | community | 1391 |
|
95 |
+
|
96 |
+
|
97 |
+
Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
|
98 |
|
99 |
**GPT2Tokenizer 上的 llama.cpp 存在一些问题,会尽快修复...**
|
100 |
|
|
|
144 |
|
145 |
## GSM8K
|
146 |
|
147 |
+
**零样本准确率0.7012888551933283**(超过MetaMath-13B和Qwen-14B)
|
148 |
+
|
149 |
+
## AlpacaEval Leaderboard
|
150 |
+
| | win_rate | standard_error | n_wins | n_wins_base | n_draws | n_total | mode | avg_length |
|
151 |
+
| ------------ | -------- | -------------- | ------ | ----------- | ------- | ------- | --------- | ---------- |
|
152 |
+
| causallm-14b | **88.26087** | 1.116333 | 705 | 89 | 11 | 805 | community | 1391 |
|
153 |
+
|
154 |
+
在 [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) 胜率 **88.26%** [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
|