File size: 6,337 Bytes
08547af
 
 
 
 
 
 
 
 
b4c50ce
08547af
9bd5b03
08547af
 
d859638
86b4dc5
9ded571
791c19e
08547af
 
0b4cc63
e69a6ee
0b4cc63
08547af
0b4cc63
08547af
0975702
08547af
0b4cc63
 
08547af
7f301db
08547af
0b4cc63
 
 
 
08547af
0b4cc63
 
 
 
 
 
08547af
0b4cc63
 
 
08547af
0b4cc63
 
 
 
 
 
 
 
c331c55
 
 
 
 
 
 
 
 
 
 
 
4229878
 
 
 
 
 
 
 
8bacc7c
 
 
4229878
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8bacc7c
4229878
8bacc7c
4229878
 
8bacc7c
4229878
8bacc7c
 
 
 
 
4229878
 
 
 
 
c331c55
ebec059
f7fb56c
 
 
 
 
 
 
 
 
 
 
 
 
8936c4f
f7fb56c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8936c4f
f7fb56c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0b4cc63
d859638
0b4cc63
7f301db
 
 
7f92a02
 
 
 
 
 
 
 
 
 
 
 
 
 
7f301db
0b4cc63
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
tags:
- text-to-image
- stable-diffusion
- lora
- diffusers
- template:sd-lora
base_model: stabilityai/stable-diffusion-xl-base-1.0
license: cc-by-nc-nd-4.0
inference: False
---
# ⚡ Flash Diffusion: FlashSDXL ⚡


Flash Diffusion is a diffusion distillation method proposed in [Flash Diffusion: Accelerating Any Conditional
Diffusion Model for Few Steps Image Generation](http://arxiv.org/abs/2406.02347) *by Clément Chadebec, Onur Tasar, Eyal Benaroche, and Benjamin Aubin* from Jasper Research.
This model is a **108M LoRA** distilled version of [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model that is able to generate images in **4 steps**. The main purpose of this model is to reproduce the main results of the paper.
See our [live demo](https://huggingface.co/spaces/jasperai/FlashPixart) and official [Github repo](https://github.com/gojasper/flash-diffusion).


<p align="center">
   <img style="width:700px;" src="images/flash_sdxl.jpg">
</p>

# How to use?

The model can be used using the `DiffusionPipeline` from `diffusers` library directly. It can allow reducing the number of required sampling steps to **4 steps**.

```python
from diffusers import DiffusionPipeline, LCMScheduler

adapter_id = "jasperai/flash-sdxl"

pipe = DiffusionPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  use_safetensors=True,
)

pipe.scheduler = LCMScheduler.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  subfolder="scheduler",
  timestep_spacing="trailing",
)
pipe.to("cuda")

# Fuse and load LoRA weights
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()

prompt = "A raccoon reading a book in a lush forest."

image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]
```
<p align="center">
   <img style="width:400px;" src="images/raccoon.png">
</p>

# How to use in Comfy?

To use FlashSDXL locally using Comfyui you need to :

1. Make sure your comfyUI install is up to date
2. Download the checkpoint from [huggingface](https://huggingface.co/jasperai/flash-sdxl).
   In case you wonder how, go to "Files and Version" go to `comfy/` folder and hit the download button next to the `FlashSDXL.safetensors`
3. Move the new checkpoint file to your local `comfyUI/models/loras/.` folder
4. Use it as a LoRA on top of `sd_xl_base_1.0_0.9vae.safetensors`, a simple comfyui `workflow.json` is provided in this repo (available in the same `comfy/` folder)

> Disclaimer : Model has been trained to work with a cfg scale of 1 and a lcm scheduler but parameters can be tweaked a bit.

# Combining Flash Diffusion with Existing LoRAs 🎨

FlashSDXL can also be combined with existing LoRAs to unlock few steps generation in a **training free** manner. It can be integrated straight to Hugging Face pipelines. See an example below.

```python
from diffusers import DiffusionPipeline, LCMScheduler
import torch

user_lora_id = "TheLastBen/Papercut_SDXL"
trigger_word = "papercut"

flash_lora_id = "jasperai/flash-sdxl"

# Load Pipeline
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16"
)

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(
    pipe.scheduler.config
)

# Load LoRAs
pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
pipe.load_lora_weights(user_lora_id, adapter_name="lora")

pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
pipe.to(device="cuda", dtype=torch.float16)

prompt = f"{trigger_word} a cute corgi"

image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0
).images[0]
```
<p align="center">
   <img style="width:400px;" src="images/corgi.jpg">
</p>

> Hint 💡 : You can also use additional LoRA using the provided comfy workflow and test it on your machine.

# Combining Flash Diffusion with Existing ControlNets 🎨

FlashSDXL can also be combined with existing ControlNets to unlock few steps generation in a **training free** manner. It can be integrated straight to Hugging Face pipelines. See an example below.

```python
import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid

flash_lora_id = "jasperai/flash-sdxl"

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((1024, 1024))

image = np.array(image)

image = cv2.Canny(image, 100, 200)
image = image[:, :, None].repeat(3, 2)
canny_image = Image.fromarray(image)

# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
    variant="fp16"
).to("cuda")

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Load LoRA
pipe.load_lora_weights(flash_lora_id)
pipe.fuse_lora()

image = pipe(
    "picture of the mona lisa",
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=0,
    controlnet_conditioning_scale=0.5,
    cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)
```
<p align="center">
   <img style="width:400px;" src="images/controlnet.jpg">
</p>


# Training Details
The model was trained for 20k iterations on 4 H100 GPUs (representing approximately a total of 176 GPU hours of training). Please refer to the [paper](http://arxiv.org/abs/2406.02347) for further parameters details. 

**Metrics on COCO 2014 validation (Table 3)**
  - FID-10k: 21.62 (4 NFE)
  - CLIP Score: 0.327 (4 NFE)

## Citation
If you find this work useful or use it in your research, please consider citing us

```bibtex
@misc{chadebec2024flash,
      title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation}, 
      author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
      year={2024},
      eprint={2406.02347},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```
   
## License
This model is released under the the Creative Commons BY-NC license.