JosephusCheung commited on
Commit
dd009b6
·
1 Parent(s): 0605834

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -88,6 +88,13 @@ Hard ACC:54.71
88
 
89
  **Zero-shot ACC 0.7012888551933283** (Outperforms MetaMath-13B, Qwen-14B)
90
 
 
 
 
 
 
 
 
91
 
92
  **GPT2Tokenizer 上的 llama.cpp 存在一些问题,会尽快修复...**
93
 
@@ -137,4 +144,11 @@ STEM准确率:66.71
137
 
138
  ## GSM8K
139
 
140
- **零样本准确率0.7012888551933283**(超过MetaMath-13B和Qwen-14B)
 
 
 
 
 
 
 
 
88
 
89
  **Zero-shot ACC 0.7012888551933283** (Outperforms MetaMath-13B, Qwen-14B)
90
 
91
+ ## AlpacaEval Leaderboard
92
+ | | win_rate | standard_error | n_wins | n_wins_base | n_draws | n_total | mode | avg_length |
93
+ | ------------ | -------- | -------------- | ------ | ----------- | ------- | ------- | --------- | ---------- |
94
+ | causallm-14b | **88.26087** | 1.116333 | 705 | 89 | 11 | 805 | community | 1391 |
95
+
96
+
97
+ Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
98
 
99
  **GPT2Tokenizer 上的 llama.cpp 存在一些问题,会尽快修复...**
100
 
 
144
 
145
  ## GSM8K
146
 
147
+ **零样本准确率0.7012888551933283**(超过MetaMath-13B和Qwen-14B)
148
+
149
+ ## AlpacaEval Leaderboard
150
+ | | win_rate | standard_error | n_wins | n_wins_base | n_draws | n_total | mode | avg_length |
151
+ | ------------ | -------- | -------------- | ------ | ----------- | ------- | ------- | --------- | ---------- |
152
+ | causallm-14b | **88.26087** | 1.116333 | 705 | 89 | 11 | 805 | community | 1391 |
153
+
154
+ 在 [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) 胜率 **88.26%** [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)