flash-sdxl / README.md

eyal.benaroche

readme re-order

c331c55 7 months ago

6.34 kB

	---
	tags:
	- text-to-image
	- stable-diffusion
	- lora
	- diffusers
	- template:sd-lora
	base_model: stabilityai/stable-diffusion-xl-base-1.0
	license: cc-by-nc-nd-4.0
	inference: False
	---
	# ⚡ Flash Diffusion: FlashSDXL ⚡


	Flash Diffusion is a diffusion distillation method proposed in [Flash Diffusion: Accelerating Any Conditional
	Diffusion Model for Few Steps Image Generation](http://arxiv.org/abs/2406.02347) by Clément Chadebec, Onur Tasar, Eyal Benaroche, and Benjamin Aubin from Jasper Research.
	This model is a 108M LoRA distilled version of [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model that is able to generate images in 4 steps. The main purpose of this model is to reproduce the main results of the paper.
	See our [live demo](https://huggingface.co/spaces/jasperai/FlashPixart) and official [Github repo](https://github.com/gojasper/flash-diffusion).


	<p align="center">
	<img style="width:700px;" src="images/flash_sdxl.jpg">
	</p>

	# How to use?

	The model can be used using the `DiffusionPipeline` from `diffusers` library directly. It can allow reducing the number of required sampling steps to 4 steps.

	```python
	from diffusers import DiffusionPipeline, LCMScheduler

	adapter_id = "jasperai/flash-sdxl"

	pipe = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	use_safetensors=True,
	)

	pipe.scheduler = LCMScheduler.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	subfolder="scheduler",
	timestep_spacing="trailing",
	)
	pipe.to("cuda")

	# Fuse and load LoRA weights
	pipe.load_lora_weights(adapter_id)
	pipe.fuse_lora()

	prompt = "A raccoon reading a book in a lush forest."

	image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]
	```
	<p align="center">
	<img style="width:400px;" src="images/raccoon.png">
	</p>

	# How to use in Comfy?

	To use FlashSDXL locally using Comfyui you need to :

	1. Make sure your comfyUI install is up to date
	2. Download the checkpoint from [huggingface](https://huggingface.co/jasperai/flash-sdxl).
	In case you wonder how, go to "Files and Version" go to `comfy/` folder and hit the download button next to the `FlashSDXL.safetensors`
	3. Move the new checkpoint file to your local `comfyUI/models/loras/.` folder
	4. Use it as a LoRA on top of `sd_xl_base_1.0_0.9vae.safetensors`, a simple comfyui `workflow.json` is provided in this repo (available in the same `comfy/` folder)

	> Disclaimer : Model has been trained to work with a cfg scale of 1 and a lcm scheduler but parameters can be tweaked a bit.

	# Combining Flash Diffusion with Existing LoRAs 🎨

	FlashSDXL can also be combined with existing LoRAs to unlock few steps generation in a training free manner. It can be integrated straight to Hugging Face pipelines. See an example below.

	```python
	from diffusers import DiffusionPipeline, LCMScheduler
	import torch

	user_lora_id = "TheLastBen/Papercut_SDXL"
	trigger_word = "papercut"

	flash_lora_id = "jasperai/flash-sdxl"

	# Load Pipeline
	pipe = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	variant="fp16"
	)

	# Set scheduler
	pipe.scheduler = LCMScheduler.from_config(
	pipe.scheduler.config
	)

	# Load LoRAs
	pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
	pipe.load_lora_weights(user_lora_id, adapter_name="lora")

	pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
	pipe.to(device="cuda", dtype=torch.float16)

	prompt = f"{trigger_word} a cute corgi"

	image = pipe(
	prompt,
	num_inference_steps=4,
	guidance_scale=0
	).images[0]
	```
	<p align="center">
	<img style="width:400px;" src="images/corgi.jpg">
	</p>

	> Hint 💡 : You can also use additional LoRA using the provided comfy workflow and test it on your machine.

	# Combining Flash Diffusion with Existing ControlNets 🎨

	FlashSDXL can also be combined with existing ControlNets to unlock few steps generation in a training free manner. It can be integrated straight to Hugging Face pipelines. See an example below.

	```python
	import torch
	import cv2
	import numpy as np
	from PIL import Image

	from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
	from diffusers.utils import load_image, make_image_grid

	flash_lora_id = "jasperai/flash-sdxl"

	image = load_image(
	"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
	).resize((1024, 1024))

	image = np.array(image)

	image = cv2.Canny(image, 100, 200)
	image = image[:, :, None].repeat(3, 2)
	canny_image = Image.fromarray(image)

	# Load ControlNet
	controlnet = ControlNetModel.from_pretrained(
	"diffusers/controlnet-canny-sdxl-1.0",
	torch_dtype=torch.float16,
	variant="fp16"
	)
	pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	controlnet=controlnet,
	torch_dtype=torch.float16,
	safety_checker=None,
	variant="fp16"
	).to("cuda")

	# Set scheduler
	pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

	# Load LoRA
	pipe.load_lora_weights(flash_lora_id)
	pipe.fuse_lora()

	image = pipe(
	"picture of the mona lisa",
	image=canny_image,
	num_inference_steps=4,
	guidance_scale=0,
	controlnet_conditioning_scale=0.5,
	cross_attention_kwargs={"scale": 1},
	).images[0]
	make_image_grid([canny_image, image], rows=1, cols=2)
	```
	<p align="center">
	<img style="width:400px;" src="images/controlnet.jpg">
	</p>


	# Training Details
	The model was trained for 20k iterations on 4 H100 GPUs (representing approximately a total of 176 GPU hours of training). Please refer to the [paper](http://arxiv.org/abs/2406.02347) for further parameters details.

	Metrics on COCO 2014 validation (Table 3)
	- FID-10k: 21.62 (4 NFE)
	- CLIP Score: 0.327 (4 NFE)

	## Citation
	If you find this work useful or use it in your research, please consider citing us

	```bibtex
	@misc{chadebec2024flash,
	title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation},
	author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
	year={2024},
	eprint={2406.02347},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	## License
	This model is released under the the Creative Commons BY-NC license.