Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,64 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: black-forest-labs/FLUX.1-dev
|
3 |
+
library_name: gguf
|
4 |
+
license: other
|
5 |
+
license_name: flux-1-dev-non-commercial-license
|
6 |
+
license_link: LICENSE.md
|
7 |
+
quantized_by: mo137
|
8 |
+
tags:
|
9 |
+
- text-to-image
|
10 |
+
- image-generation
|
11 |
+
- flux
|
12 |
+
---
|
13 |
+
|
14 |
+
Flux.1-dev in a few experimental custom formats, mixing tensors in **Q8_0**, **fp16**, and **fp32**.
|
15 |
+
Converted from black-forest-labs' original bf16 weights.
|
16 |
+
|
17 |
+
### Motivation
|
18 |
+
Flux's weights were published in bf16.
|
19 |
+
Conversion to fp16 is slightly lossy, but fp32 is lossless.
|
20 |
+
I experimented with mixed tensor formats to see if it would improve quality.
|
21 |
+
|
22 |
+
### Evaluation
|
23 |
+
I tried comparing the outputs but I can't say with any certainty if these models are significantly better than pure Q8_0.
|
24 |
+
You're probably better off using Q8_0, but I thought I'll share these – maybe someone will find them useful.
|
25 |
+
|
26 |
+
Higher bits per weight (bpw) numbers result in slower computation:
|
27 |
+
```
|
28 |
+
20 s Q8_0
|
29 |
+
23 s 11.0bpw-txt16
|
30 |
+
30 s fp16
|
31 |
+
37 s 16.4bpw-txt32
|
32 |
+
310 s fp32
|
33 |
+
```
|
34 |
+
|
35 |
+
In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
|
36 |
+
```
|
37 |
+
img_mlp.0
|
38 |
+
img_mlp.2
|
39 |
+
img_mod.lin
|
40 |
+
linear1
|
41 |
+
linear2
|
42 |
+
modulation.lin
|
43 |
+
```
|
44 |
+
But left all these at fp16 or fp32, respectively:
|
45 |
+
```
|
46 |
+
txt_mlp.0
|
47 |
+
txt_mlp.2
|
48 |
+
txt_mod.lin
|
49 |
+
```
|
50 |
+
The resulting bpw number is just an approximation from file size.
|
51 |
+
|
52 |
+
---
|
53 |
+
|
54 |
+
This is a direct GGUF conversion of [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main)
|
55 |
+
|
56 |
+
As this is a quantized model not a finetune, all the same restrictions/original license terms still apply.
|
57 |
+
|
58 |
+
The model files can be used with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node.
|
59 |
+
|
60 |
+
Place model files in `ComfyUI/models/unet` - see the GitHub readme for further install instructions.
|
61 |
+
|
62 |
+
Please refer to [this chart](https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md#llama-3-8b-scoreboard) for a basic overview of quantization types.
|
63 |
+
|
64 |
+
(Model card mostly copied from [city96/FLUX.1-dev-gguf](https://huggingface.co/city96/FLUX.1-dev-gguf) - which contains conventional and useful GGUF files.)
|