File size: 2,854 Bytes
4f2dd61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a32abda
4f2dd61
a32abda
4f2dd61
 
 
a32abda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f2dd61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
base_model: black-forest-labs/FLUX.1-dev
library_name: gguf
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
quantized_by: mo137
tags:
- text-to-image
- image-generation
- flux
---

Flux.1-dev in a few experimental custom formats, mixing tensors in **Q8_0**, **fp16**, and **fp32**.
Converted from black-forest-labs' original bf16 weights.

### Motivation
Flux's weights were published in bf16.
Conversion to fp16 is slightly lossy, but fp32 is lossless.
I experimented with mixed tensor formats to see if it would improve quality.

### Evaluation
I tried comparing the outputs but I can't say with any certainty if these models are significantly better than pure Q8_0.
You're probably better off using Q8_0, but I thought I'll share these – maybe someone will find them useful.

Higher bits per weight (bpw) numbers result in slower computation:
```
 20 s  Q8_0
 23 s  11.024bpw-txt16.gguf
 30 s  fp16
 37 s  16.422bpw-txt32.gguf
310 s  fp32
```

### Update 2024-08-26
Two new files. This time the only tensors in Q8_0 are some or all of:
```
double_blocks.*.img_mlp.0.weight
double_blocks.*.img_mlp.2.weight
double_blocks.*.txt_mlp.0.weight
double_blocks.*.txt_mlp.2.weight

double_blocks.*.img_mod.lin.weight
double_blocks.*.txt_mod.lin.weight
single_blocks.*.linear1.weight
single_blocks.*.linear2.weight
single_blocks.*.modulation.lin.weight
```

- `flux1-dev-Q8_0-fp32-11.763bpw.gguf`  
  This version has all the above layers in Q8_0.
- `flux1-dev-Q8_0-fp32-13.962bpw.gguf`  
  This version preserves first **2** layers of all kinds, and first **4** MLP layers in fp32.
- `flux1-dev-Q8_0-fp32-16.161bpw.gguf`  
  This one, first **4** layers of any kind and first **8** MLP layers in fp32.

In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
```
img_mlp.0
img_mlp.2
img_mod.lin
linear1
linear2
modulation.lin
```
But left all these at fp16 or fp32, respectively:
```
txt_mlp.0
txt_mlp.2
txt_mod.lin
```
The resulting bpw number is just an approximation from file size.

---

This is a direct GGUF conversion of [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main)

As this is a quantized model not a finetune, all the same restrictions/original license terms still apply.

The model files can be used with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node.

Place model files in `ComfyUI/models/unet` - see the GitHub readme for further install instructions.

Please refer to [this chart](https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md#llama-3-8b-scoreboard) for a basic overview of quantization types.

(Model card mostly copied from [city96/FLUX.1-dev-gguf](https://huggingface.co/city96/FLUX.1-dev-gguf) - which contains conventional and useful GGUF files.)