File size: 6,337 Bytes
08547af b4c50ce 08547af 9bd5b03 08547af d859638 86b4dc5 9ded571 791c19e 08547af 0b4cc63 e69a6ee 0b4cc63 08547af 0b4cc63 08547af 0975702 08547af 0b4cc63 08547af 7f301db 08547af 0b4cc63 08547af 0b4cc63 08547af 0b4cc63 08547af 0b4cc63 c331c55 4229878 8bacc7c 4229878 8bacc7c 4229878 8bacc7c 4229878 8bacc7c 4229878 8bacc7c 4229878 c331c55 ebec059 f7fb56c 8936c4f f7fb56c 8936c4f f7fb56c 0b4cc63 d859638 0b4cc63 7f301db 7f92a02 7f301db 0b4cc63 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
---
tags:
- text-to-image
- stable-diffusion
- lora
- diffusers
- template:sd-lora
base_model: stabilityai/stable-diffusion-xl-base-1.0
license: cc-by-nc-nd-4.0
inference: False
---
# ⚡ Flash Diffusion: FlashSDXL ⚡
Flash Diffusion is a diffusion distillation method proposed in [Flash Diffusion: Accelerating Any Conditional
Diffusion Model for Few Steps Image Generation](http://arxiv.org/abs/2406.02347) *by Clément Chadebec, Onur Tasar, Eyal Benaroche, and Benjamin Aubin* from Jasper Research.
This model is a **108M LoRA** distilled version of [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model that is able to generate images in **4 steps**. The main purpose of this model is to reproduce the main results of the paper.
See our [live demo](https://huggingface.co/spaces/jasperai/FlashPixart) and official [Github repo](https://github.com/gojasper/flash-diffusion).
<p align="center">
<img style="width:700px;" src="images/flash_sdxl.jpg">
</p>
# How to use?
The model can be used using the `DiffusionPipeline` from `diffusers` library directly. It can allow reducing the number of required sampling steps to **4 steps**.
```python
from diffusers import DiffusionPipeline, LCMScheduler
adapter_id = "jasperai/flash-sdxl"
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
use_safetensors=True,
)
pipe.scheduler = LCMScheduler.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
subfolder="scheduler",
timestep_spacing="trailing",
)
pipe.to("cuda")
# Fuse and load LoRA weights
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()
prompt = "A raccoon reading a book in a lush forest."
image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]
```
<p align="center">
<img style="width:400px;" src="images/raccoon.png">
</p>
# How to use in Comfy?
To use FlashSDXL locally using Comfyui you need to :
1. Make sure your comfyUI install is up to date
2. Download the checkpoint from [huggingface](https://huggingface.co/jasperai/flash-sdxl).
In case you wonder how, go to "Files and Version" go to `comfy/` folder and hit the download button next to the `FlashSDXL.safetensors`
3. Move the new checkpoint file to your local `comfyUI/models/loras/.` folder
4. Use it as a LoRA on top of `sd_xl_base_1.0_0.9vae.safetensors`, a simple comfyui `workflow.json` is provided in this repo (available in the same `comfy/` folder)
> Disclaimer : Model has been trained to work with a cfg scale of 1 and a lcm scheduler but parameters can be tweaked a bit.
# Combining Flash Diffusion with Existing LoRAs 🎨
FlashSDXL can also be combined with existing LoRAs to unlock few steps generation in a **training free** manner. It can be integrated straight to Hugging Face pipelines. See an example below.
```python
from diffusers import DiffusionPipeline, LCMScheduler
import torch
user_lora_id = "TheLastBen/Papercut_SDXL"
trigger_word = "papercut"
flash_lora_id = "jasperai/flash-sdxl"
# Load Pipeline
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16"
)
# Set scheduler
pipe.scheduler = LCMScheduler.from_config(
pipe.scheduler.config
)
# Load LoRAs
pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
pipe.load_lora_weights(user_lora_id, adapter_name="lora")
pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
pipe.to(device="cuda", dtype=torch.float16)
prompt = f"{trigger_word} a cute corgi"
image = pipe(
prompt,
num_inference_steps=4,
guidance_scale=0
).images[0]
```
<p align="center">
<img style="width:400px;" src="images/corgi.jpg">
</p>
> Hint 💡 : You can also use additional LoRA using the provided comfy workflow and test it on your machine.
# Combining Flash Diffusion with Existing ControlNets 🎨
FlashSDXL can also be combined with existing ControlNets to unlock few steps generation in a **training free** manner. It can be integrated straight to Hugging Face pipelines. See an example below.
```python
import torch
import cv2
import numpy as np
from PIL import Image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
flash_lora_id = "jasperai/flash-sdxl"
image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((1024, 1024))
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None].repeat(3, 2)
canny_image = Image.fromarray(image)
# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=None,
variant="fp16"
).to("cuda")
# Set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
# Load LoRA
pipe.load_lora_weights(flash_lora_id)
pipe.fuse_lora()
image = pipe(
"picture of the mona lisa",
image=canny_image,
num_inference_steps=4,
guidance_scale=0,
controlnet_conditioning_scale=0.5,
cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)
```
<p align="center">
<img style="width:400px;" src="images/controlnet.jpg">
</p>
# Training Details
The model was trained for 20k iterations on 4 H100 GPUs (representing approximately a total of 176 GPU hours of training). Please refer to the [paper](http://arxiv.org/abs/2406.02347) for further parameters details.
**Metrics on COCO 2014 validation (Table 3)**
- FID-10k: 21.62 (4 NFE)
- CLIP Score: 0.327 (4 NFE)
## Citation
If you find this work useful or use it in your research, please consider citing us
```bibtex
@misc{chadebec2024flash,
title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation},
author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
year={2024},
eprint={2406.02347},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## License
This model is released under the the Creative Commons BY-NC license.
|