mo137
/

FLUX.1-dev_Q8-fp16-fp32-mix_8-to-32-bpw_gguf

image-generation

Model card Files Files and versions Community

mo137 commited on Aug 26

Commit

a32abda

•

1 Parent(s): 4b18183

Update README.md

Files changed (1) hide show

README.md +24 -2

README.md CHANGED Viewed

@@ -26,12 +26,34 @@ You're probably better off using Q8_0, but I thought I'll share these – maybe
 Higher bits per weight (bpw) numbers result in slower computation:
 ```
  20 s  Q8_0
- 23 s  11.0bpw-txt16
  30 s  fp16
- 37 s  16.4bpw-txt32
 310 s  fp32
 ```
 In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
 ```
 img_mlp.0

 Higher bits per weight (bpw) numbers result in slower computation:
 ```
  20 s  Q8_0
+ 23 s  11.024bpw-txt16.gguf
  30 s  fp16
+ 37 s  16.422bpw-txt32.gguf
 310 s  fp32
 ```
+### Update 2024-08-26
+Two new files. This time the only tensors in Q8_0 are some or all of:
+```
+double_blocks.*.img_mlp.0.weight
+double_blocks.*.img_mlp.2.weight
+double_blocks.*.txt_mlp.0.weight
+double_blocks.*.txt_mlp.2.weight
+double_blocks.*.img_mod.lin.weight
+double_blocks.*.txt_mod.lin.weight
+single_blocks.*.linear1.weight
+single_blocks.*.linear2.weight
+single_blocks.*.modulation.lin.weight
+```
+- `flux1-dev-Q8_0-fp32-11.763bpw.gguf`
+  This version has all the above layers in Q8_0.
+- `flux1-dev-Q8_0-fp32-13.962bpw.gguf`
+  This version preserves first **2** layers of all kinds, and first **4** MLP layers in fp32.
+- `flux1-dev-Q8_0-fp32-16.161bpw.gguf`
+  This one, first **4** layers of any kind and first **8** MLP layers in fp32.
 In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
 ```
 img_mlp.0