alexmarques commited on
Commit
7e4ac27
1 Parent(s): 44b6153

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -20
README.md CHANGED
@@ -20,15 +20,15 @@ license: llama3.1
20
  - **Output:** Text
21
  - **Model Optimizations:**
22
  - **Weight quantization:** INT8
23
- - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), this models is intended for assistant-like chat.
24
- - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
25
  - **Release Date:** 7/23/2024
26
  - **Version:** 1.0
27
  - **License(s):** [Llama3](https://llama.meta.com/llama3/license/)
28
  - **Model Developers:** Neural Magic
29
 
30
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
31
- It achieves an average score of 69.48 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 69.33.
32
 
33
  ### Model Optimizations
34
 
@@ -123,14 +123,9 @@ model.save_pretrained("Meta-Llama-3.1-8B-Instruct-quantized.w8a16")
123
 
124
  ## Evaluation
125
 
126
- The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/383bbd54bc621086e05aa1b030d8d4d5635b25e6) (commit 383bbd54bc621086e05aa1b030d8d4d5635b25e6) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command:
127
- ```
128
- lm_eval \
129
- --model vllm \
130
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",dtype=auto,gpu_memory_utilization=0.4,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
131
- --tasks openllm \
132
- --batch_size auto
133
- ```
134
 
135
  ### Accuracy
136
 
@@ -157,23 +152,23 @@ lm_eval \
157
  </td>
158
  </tr>
159
  <tr>
160
- <td>ARC Challenge (25-shot)
161
  </td>
162
- <td>60.41
163
  </td>
164
- <td>61.09
165
  </td>
166
- <td>101.1%
167
  </td>
168
  </tr>
169
  <tr>
170
- <td>GSM-8K (5-shot, strict-match)
171
  </td>
172
- <td>75.66
173
  </td>
174
- <td>76.04
175
  </td>
176
- <td>100.5%
177
  </td>
178
  </tr>
179
  <tr>
@@ -216,4 +211,70 @@ lm_eval \
216
  <td><strong>100.2%</strong>
217
  </td>
218
  </tr>
219
- </table>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - **Output:** Text
21
  - **Model Optimizations:**
22
  - **Weight quantization:** INT8
23
+ - **Intended Use Cases:** Intended for commercial and research use multiple languages. Similarly to [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), this models is intended for assistant-like chat.
24
+ - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
25
  - **Release Date:** 7/23/2024
26
  - **Version:** 1.0
27
  - **License(s):** [Llama3](https://llama.meta.com/llama3/license/)
28
  - **Model Developers:** Neural Magic
29
 
30
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
31
+ It achieves scores within 1% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
32
 
33
  ### Model Optimizations
34
 
 
123
 
124
  ## Evaluation
125
 
126
+ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA.
127
+ Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
128
+ This version of the lm-evaluation-harness includes versions of ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
 
 
 
 
 
129
 
130
  ### Accuracy
131
 
 
152
  </td>
153
  </tr>
154
  <tr>
155
+ <td>ARC Challenge (0-shot)
156
  </td>
157
+ <td>83.19
158
  </td>
159
+ <td>82.68
160
  </td>
161
+ <td>99.4%
162
  </td>
163
  </tr>
164
  <tr>
165
+ <td>GSM-8K (CoT, 8-shot, strict-match)
166
  </td>
167
+ <td>82.79
168
  </td>
169
+ <td>82.64
170
  </td>
171
+ <td>99.8%
172
  </td>
173
  </tr>
174
  <tr>
 
211
  <td><strong>100.2%</strong>
212
  </td>
213
  </tr>
214
+ </table>
215
+
216
+ ### Reproduction
217
+
218
+ The results were obtained using the following commands:
219
+
220
+ #### MMLU
221
+ ```
222
+ lm_eval \
223
+ --model vllm \
224
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
225
+ --tasks mmlu \
226
+ --num_fewshot 5 \
227
+ --batch_size auto
228
+ ```
229
+
230
+ #### ARC-Challenge
231
+ ```
232
+ lm_eval \
233
+ --model vllm \
234
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
235
+ --tasks arc_challenge_llama_3.1_instruct \
236
+ --apply_chat_template \
237
+ --num_fewshot 0 \
238
+ --batch_size auto
239
+ ```
240
+
241
+ #### GSM-8K
242
+ ```
243
+ lm_eval \
244
+ --model vllm \
245
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
246
+ --tasks gsm8k_cot_llama_3.1_instruct \
247
+ --apply_chat_template \
248
+ --num_fewshot 8 \
249
+ --batch_size auto
250
+ ```
251
+
252
+ #### Hellaswag
253
+ ```
254
+ lm_eval \
255
+ --model vllm \
256
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
257
+ --tasks hellaswag \
258
+ --num_fewshot 10 \
259
+ --batch_size auto
260
+ ```
261
+
262
+ #### Winogrande
263
+ ```
264
+ lm_eval \
265
+ --model vllm \
266
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
267
+ --tasks winogrande \
268
+ --num_fewshot 5 \
269
+ --batch_size auto
270
+ ```
271
+
272
+ #### Hellaswag
273
+ ```
274
+ lm_eval \
275
+ --model vllm \
276
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
277
+ --tasks truthfulqa_mc \
278
+ --num_fewshot 0 \
279
+ --batch_size auto
280
+ ```