appoose commited on
Commit
956a729
1 Parent(s): 49119b7

adding initial metrics

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -37,10 +37,19 @@ outputs = model.generate(**(inputs.to('cuda')), max_new_tokens=1000)
37
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
38
  ```
39
 
 
 
 
 
 
 
 
 
40
  ----------------------------------------------------------------------------------------------------------------------------------
41
  </p>
42
 
43
  ### Quantization
 
44
  You can reproduce the model using the following quant configs:
45
 
46
  ``` Python
@@ -70,4 +79,4 @@ quant_config['block_sparse_moe.experts.w3'] = experts_params
70
  model.quantize_model(quant_config=quant_config, compute_dtype=torch.float16);
71
  model.eval();
72
  ```
73
-
 
37
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
38
  ```
39
 
40
+
41
+ ## Performance
42
+ | Models | Mixtral Original | HQQ quantized |
43
+ |-------------------|------------------|------------------|
44
+ | ARC (25-shot) | 70.22 | 66.47 |
45
+ | TruthfulQA-MC2 | 64.57 | 62.85 |
46
+ | Winogrande (5-shot)| 81.36 | 79.40 |
47
+
48
  ----------------------------------------------------------------------------------------------------------------------------------
49
  </p>
50
 
51
  ### Quantization
52
+
53
  You can reproduce the model using the following quant configs:
54
 
55
  ``` Python
 
79
  model.quantize_model(quant_config=quant_config, compute_dtype=torch.float16);
80
  model.eval();
81
  ```
82
+ The code in github at https://github.com/mobiusml/hqq/blob/master/examples/hf/mixtral_13GB_example.py