amezasor commited on
Commit
408b6e9
·
verified ·
1 Parent(s): 96ef4b5

evaluation results

Browse files
Files changed (1) hide show
  1. README.md +228 -17
README.md CHANGED
@@ -58,27 +58,238 @@ output = tokenizer.batch_decode(output)
58
  # print output
59
  print(output)
60
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  **Model Architecture:**
63
  Granite-3.1-1B-A400M-Base is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
64
 
65
- | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
66
- | :-------- | :--------| :--------| :-------- | :--------|
67
- | Embedding size | 2048 | 4096 | **1024** | 1536 |
68
- | Number of layers | 40 | 40 | **24** | 32 |
69
- | Attention head size | 64 | 128 | **64** | 64 |
70
- | Number of attention heads | 32 | 32 | **16** | 24 |
71
- | Number of KV heads | 8 | 8 | **8** | 8 |
72
- | MLP hidden size | 8192 | 12800 | **512** | 512 |
73
- | MLP activation | SwiGLU | SwiGLU | **SwiGLU** | SwiGLU |
74
- | Number of experts | — | — | **32** | 40 |
75
- | MoE TopK | — | — | **8** | 8 |
76
- | Initialization std | 0.1 | 0.1 | **0.1** | 0.1 |
77
- | Sequence length | 128K | 128K | **128K** | 128K |
78
- | Position embedding | RoPE | RoPE | **RoPE** | RoPE |
79
- | # Parameters | 2.5B | 8.1B | **1.3B** | 3.3B |
80
- | # Active parameters | 2.5B | 8.1B | **400M** | 800M |
81
- | # Training tokens | 12T | 12T | **10T** | 10T |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  **Training Data:**
84
  This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
 
58
  # print output
59
  print(output)
60
  ```
61
+ **Evaluation Results:**
62
+ <table>
63
+ <caption><b>HuggingFace Open LLM Leaderboard V1</b></caption>
64
+ <thead>
65
+ <tr>
66
+ <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
67
+ <th style="text-align:center; background-color: #001d6c; color: white;">ARC-Challenge</th>
68
+ <th style="text-align:center; background-color: #001d6c; color: white;">Hellaswag</th>
69
+ <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
70
+ <th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
71
+ <th style="text-align:center; background-color: #001d6c; color: white;">Winogrande</th>
72
+ <th style="text-align:center; background-color: #001d6c; color: white;">GSM8K</th>
73
+ <th style="text-align:center; background-color: #001d6c; color: white;">Avg</th>
74
+ </tr></thead>
75
+ <tbody>
76
+ <tr>
77
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Granite-3.1-8B-Base</td>
78
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">63.99</td>
79
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">83.27</td>
80
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">63.45</td>
81
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">51.29</td>
82
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">78.92</td>
83
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">60.19</td>
84
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">66.85</td>
85
+ </tr>
86
+ <tr>
87
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Base</td>
88
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">53.58</td>
89
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">77.67</td>
90
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">52.86</td>
91
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">39.02</td>
92
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">72.84</td>
93
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">47.99</td>
94
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">57.32</td>
95
+ </tr>
96
+ <tr>
97
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A800M-Base</td>
98
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">50.76</td>
99
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">74.45</td>
100
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">48.31</td>
101
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">39.91</td>
102
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">69.29</td>
103
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">40.56</td>
104
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">53.88</td>
105
+ </tr>
106
+ <tr>
107
+ <td style="text-align:left; background-color: #DAE8FF; color: #2D2D2D;">Granite-3.1-1B-A400M-Base</td>
108
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">39.42</td>
109
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">66.13</td>
110
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">26.53</td>
111
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">37.67</td>
112
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">2.03</td>
113
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">18.87</td>
114
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">31.78</td>
115
+ </tr>
116
+ </tbody></table>
117
+
118
+ <table>
119
+ <caption><b>HuggingFace Open LLM Leaderboard V2</b></caption>
120
+ <thead>
121
+ <tr>
122
+ <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
123
+ <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
124
+ <th style="text-align:center; background-color: #001d6c; color: white;">BBH</th>
125
+ <th style="text-align:center; background-color: #001d6c; color: white;">MATH Lvl 5</th>
126
+ <th style="text-align:center; background-color: #001d6c; color: white;">GPQA</th>
127
+ <th style="text-align:center; background-color: #001d6c; color: white;">MUSR</th>
128
+ <th style="text-align:center; background-color: #001d6c; color: white;">MMLU-Pro</th>
129
+ <th style="text-align:center; background-color: #001d6c; color: white;">Avg</th>
130
+ </tr></thead>
131
+ <tbody>
132
+ <tr>
133
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Granite-3.1-8B-Base</td>
134
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">42.21</td>
135
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">26.02</td>
136
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">9.52</td>
137
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">9.51</td>
138
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8.36</td>
139
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">24.8</td>
140
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">20.07</td>
141
+ </tr>
142
+ <tr>
143
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Base</td>
144
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">35.22</td>
145
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">16.84</td>
146
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">5.59</td>
147
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">3.69</td>
148
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">3.9</td>
149
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">13.9</td>
150
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">13.19</td>
151
+ </tr>
152
+ <tr>
153
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A800M-Base</td>
154
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">29.96</td>
155
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">11.91</td>
156
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">4</td>
157
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">3.69</td>
158
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">1.11</td>
159
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">8.81</td>
160
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">9.91</td>
161
+ </tr>
162
+ <tr>
163
+ <td style="text-align:left; background-color: #DAE8FF; color: #2D2D2D;">Granite-3.1-1B-A400M-Base</td>
164
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">25.19</td>
165
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">6.43</td>
166
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">2.19</td>
167
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">0.22</td>
168
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">1.76</td>
169
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">1.55</td>
170
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">6.22</td>
171
+ </tr>
172
+ </tbody></table>
173
 
174
  **Model Architecture:**
175
  Granite-3.1-1B-A400M-Base is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
176
 
177
+ <table>
178
+ <thead>
179
+ <tr>
180
+ <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
181
+ <th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
182
+ <th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
183
+ <th style="text-align:center; background-color: #001d6c; color: white;">1B MoE</th>
184
+ <th style="text-align:center; background-color: #001d6c; color: white;">3B MoE</th>
185
+ </tr></thead>
186
+ <tbody>
187
+ <tr>
188
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Embedding size</td>
189
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2048</td>
190
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4096</td>
191
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">1024</td>
192
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1536</td>
193
+ </tr>
194
+ <tr>
195
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of layers</td>
196
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40</td>
197
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40</td>
198
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">24</td>
199
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
200
+ </tr>
201
+ <tr>
202
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Attention head size</td>
203
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
204
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128</td>
205
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">64</td>
206
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
207
+ </tr>
208
+ <tr>
209
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of attention heads</td>
210
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
211
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
212
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">16</td>
213
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">24</td>
214
+ </tr>
215
+ <tr>
216
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of KV heads</td>
217
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
218
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
219
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8</td>
220
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
221
+ </tr>
222
+ <tr>
223
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP hidden size</td>
224
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8192</td>
225
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12800</td>
226
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">512</td>
227
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">512</td>
228
+ </tr>
229
+ <tr>
230
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP activation</td>
231
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
232
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
233
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">SwiGLU</td>
234
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
235
+ </tr>
236
+ <tr>
237
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of experts</td>
238
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
239
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
240
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">32</td>
241
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40</td>
242
+ </tr>
243
+ <tr>
244
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MoE TopK</td>
245
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
246
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
247
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8</td>
248
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
249
+ </tr>
250
+ <tr>
251
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Initialization std</td>
252
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
253
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
254
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">0.1</td>
255
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
256
+ </tr>
257
+ <tr>
258
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Sequence length</td>
259
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
260
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
261
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">128K</td>
262
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
263
+ </tr>
264
+ <tr>
265
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
266
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
267
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
268
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">RoPE</td>
269
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
270
+ </tr>
271
+ <tr>
272
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
273
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2.5B</td>
274
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8.1B</td>
275
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">1.3B</td>
276
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">3.3B</td>
277
+ </tr>
278
+ <tr>
279
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Active parameters</td>
280
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2.5B</td>
281
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8.1B</td>
282
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">400M</td>
283
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">800M</td>
284
+ </tr>
285
+ <tr>
286
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Training tokens</td>
287
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12T</td>
288
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12T</td>
289
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">10T</td>
290
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10T</td>
291
+ </tr>
292
+ </tbody></table>
293
 
294
  **Training Data:**
295
  This model is trained on a mix of open source and proprietary data following a two-stage training strategy.