File size: 2,186 Bytes
ab6493f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0882d3
 
 
 
 
 
2e13fc3
 
 
b0882d3
 
 
 
 
 
 
2e13fc3
 
 
b0882d3
3ec32c3
 
 
2e13fc3
 
3ec32c3
 
 
 
 
ba7258b
 
 
 
 
 
7c45a76
ba7258b
 
 
7c45a76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab6493f
 
 
 
 
 
 
 
 
 
7c45a76
2668728
ab6493f
 
2668728
ab6493f
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: llama3.1
---

Experimental .GGUF quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR
(based on b_3529 and now b_3565 for the newer ones) : https://github.com/ggerganov/llama.cpp/pull/8836

These experimental quant strategies revisiting Ikawrakow's work are displaying a slight decrease of perplexity,
including per bpw (from 10%+ for the lowest quants to 0.x% for the highest ones).
This is significant enough to encourage you folks to test them, and provide feedback if pertinent.

The iMatrix I use is based on Group Merged V3 and enriched with a bit of French,
a bit of Serbian, and a bit of Croatian languages.


ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) :

```

IQ3_XXS

Master
Size : 3.04 GiB (3.25 BPW)
PPL 512 wikitext : 8.4985 +/- 0.05402

PR (so so)
Size : 3.11 GiB (3.32 BPW)
PPL 512 wikitext : 8.3274 +/- 0.05334

IQ3_XS

Master
Size : 3.27 GiB (3.50 BPW)
PPL 512 wikitext : 8.2019 +/- 0.05167

PR (ok)
Size : 3.24 GiB (3.47 BPW)
PPL 512 wikitext : 8.1762 +/- 0.05176

IQ3_S

Master
Size : 3.42 GiB (3.66 BPW)
PPL 512 wikitext : 7.9894 +/- 0.05020

PR (good)
Size : 3.41 GiB (3.64 BPW)
PPL 512 wikitext : 7.9067 +/- 0.05022

IQ3_M

Master
Size : 3.52 GiB (3.76 BPW)  
PPL 512 wikitext : 7.9263 +/- 0.04943

PR (good)
Size : 3.49 GiB (3.73 BPW)
PPL 512 wikitext : 7.8704 +/- 0.04951

IQ3_XL

PR (good)
Size : 3.71 GiB (3.97 BPW)
PPL 512 wikitext : 7.7225 +/- 0.04946

IQ3_XXL

PR (good, the benefit seems meager but the token embeddings pushed form IQ3_S to IQ4_XS explains +0.05BPW of it,
and this tensor doesn't run in VRAM but in RAM)
Size : 3.83 GiB (4.09 BPW)
PPL 512 wikitext : 7.6720 +/- 0.04892

IQ3_XXL

PR (good)
Size : 3.97 GiB (4.24 BPW)
PPL 512 wikitext : 7.5920 +/- 0.04839

IQ4_XS

Master
Size : 4.13 GiB (4.42 BPW)
Arc-C 299     49.16387960    
Arc-E 570     72.10526316     
PPL 512 wikitext : 7.5226 +/- 0.04820

IQ4_XSR

PR (good)
Size : 4.16 GiB (4.45 BPW)
Arc-C 299    
Arc-E 570      
PPL 512 wikitext : 7.5072 +/- 0.04814

FP16

MASTER : Gemma 2 9b It F16.
Size : 14.96 GiB (16.00 BPW)
Arc-C 299     49.49832776
Arc-E 570     73.85964912
PPL 512 wikitext : 7.3224 +/- 0.04674

```