Update README.md
Browse files
README.md
CHANGED
@@ -26,12 +26,34 @@ You're probably better off using Q8_0, but I thought I'll share these – maybe
|
|
26 |
Higher bits per weight (bpw) numbers result in slower computation:
|
27 |
```
|
28 |
20 s Q8_0
|
29 |
-
23 s 11.
|
30 |
30 s fp16
|
31 |
-
37 s 16.
|
32 |
310 s fp32
|
33 |
```
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
|
36 |
```
|
37 |
img_mlp.0
|
|
|
26 |
Higher bits per weight (bpw) numbers result in slower computation:
|
27 |
```
|
28 |
20 s Q8_0
|
29 |
+
23 s 11.024bpw-txt16.gguf
|
30 |
30 s fp16
|
31 |
+
37 s 16.422bpw-txt32.gguf
|
32 |
310 s fp32
|
33 |
```
|
34 |
|
35 |
+
### Update 2024-08-26
|
36 |
+
Two new files. This time the only tensors in Q8_0 are some or all of:
|
37 |
+
```
|
38 |
+
double_blocks.*.img_mlp.0.weight
|
39 |
+
double_blocks.*.img_mlp.2.weight
|
40 |
+
double_blocks.*.txt_mlp.0.weight
|
41 |
+
double_blocks.*.txt_mlp.2.weight
|
42 |
+
|
43 |
+
double_blocks.*.img_mod.lin.weight
|
44 |
+
double_blocks.*.txt_mod.lin.weight
|
45 |
+
single_blocks.*.linear1.weight
|
46 |
+
single_blocks.*.linear2.weight
|
47 |
+
single_blocks.*.modulation.lin.weight
|
48 |
+
```
|
49 |
+
|
50 |
+
- `flux1-dev-Q8_0-fp32-11.763bpw.gguf`
|
51 |
+
This version has all the above layers in Q8_0.
|
52 |
+
- `flux1-dev-Q8_0-fp32-13.962bpw.gguf`
|
53 |
+
This version preserves first **2** layers of all kinds, and first **4** MLP layers in fp32.
|
54 |
+
- `flux1-dev-Q8_0-fp32-16.161bpw.gguf`
|
55 |
+
This one, first **4** layers of any kind and first **8** MLP layers in fp32.
|
56 |
+
|
57 |
In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
|
58 |
```
|
59 |
img_mlp.0
|