Vasily Alexeev
commited on
Commit
β’
c27917d
1
Parent(s):
e1b1ab8
add quality stats
Browse files
README.md
CHANGED
@@ -25,6 +25,38 @@ Quantized with [OmniQuant](https://github.com/OpenGVLab/OmniQuant).
|
|
25 |
|
26 |
## Evaluation
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
## Examples
|
29 |
|
30 |
### Imports and Model Loading
|
@@ -143,8 +175,7 @@ Quantized with [OmniQuant](https://github.com/OpenGVLab/OmniQuant).
|
|
143 |
```python
|
144 |
model_path = "compressa-ai/Saiga-Llama-3-8B-OmniQuant"
|
145 |
|
146 |
-
model = load_model(model_path)
|
147 |
-
model.cuda()
|
148 |
tokenizer = AutoTokenizer.from_pretrained(
|
149 |
model_path, use_fast=False, trust_remote_code=True
|
150 |
)
|
|
|
25 |
|
26 |
## Evaluation
|
27 |
|
28 |
+
### PPL (β)
|
29 |
+
|
30 |
+
| | wiki |
|
31 |
+
| ------------- | ----- |
|
32 |
+
| FP | 7,862 |
|
33 |
+
| **Quantized** | 8,615 |
|
34 |
+
|
35 |
+
|
36 |
+
### Accuracy on English Benchmarks, % (β)
|
37 |
+
|
38 |
+
| | piqa | arc_easy | arc_challenge | boolq | hellaswag | winogrande | mmlu_humanities | mmlu_social_sciences | mmlu_stem |
|
39 |
+
| ------------- | ---- | -------- | ------------- | ----- | --------- | ---------- | --------------- | -------------------- | --------- |
|
40 |
+
| FP | 78,5 | 82,2 | 50,4 | 82,7 | 58,1 | 72,4 | 65,5 | 72,6 | 53,8 |
|
41 |
+
| **Quantized** | 78,5 | 80,8 | 47,6 | 81,7 | 56,9 | 71,2 | 62,3 | 68,9 | 49,7 |
|
42 |
+
|
43 |
+
|
44 |
+
### Accuracy on Russian Benchmarks, % (β)
|
45 |
+
|
46 |
+
| | danetqa | terra | rwsd | muserc | rucos | lidirus | parus | rcb | russe | rucola |
|
47 |
+
| ------------- | ------- | ----- | ---- | ------ | ----- | ------- | ----- | ---- | ----- | ------ |
|
48 |
+
| FP | 74,9 | 52,1 | 51,5 | 55,9 | 58,1 | 59,5 | 69,0 | 34,1 | 38,8 | 67,5 |
|
49 |
+
| **Quantized** | 65,4 | 50,5 | 49,5 | 60,7 | 53,7 | 50,9 | 71,0 | 33,6 | 40,8 | 67,5 |
|
50 |
+
|
51 |
+
|
52 |
+
### Summary
|
53 |
+
|
54 |
+
| | Average Quality Difference, Eng, % (β) | Average Quality Difference, Rus, % (β) | Occupied Memory, % (β) |
|
55 |
+
| ------------- | -------------------------------------- | -------------------------------------- | ---------------------- |
|
56 |
+
| FP | 0 | 0 | 100 |
|
57 |
+
| **Quantized** | \-2,4 | \-1,8 | 35,7 |
|
58 |
+
|
59 |
+
|
60 |
## Examples
|
61 |
|
62 |
### Imports and Model Loading
|
|
|
175 |
```python
|
176 |
model_path = "compressa-ai/Saiga-Llama-3-8B-OmniQuant"
|
177 |
|
178 |
+
model = load_model(model_path).cuda()
|
|
|
179 |
tokenizer = AutoTokenizer.from_pretrained(
|
180 |
model_path, use_fast=False, trust_remote_code=True
|
181 |
)
|