File size: 1,949 Bytes
08547af
 
 
 
 
 
 
 
 
 
ff284db
08547af
 
7f301db
e69a6ee
08547af
 
0b4cc63
e69a6ee
0b4cc63
08547af
0b4cc63
08547af
0b4cc63
08547af
0b4cc63
 
08547af
7f301db
08547af
0b4cc63
 
 
 
08547af
0b4cc63
 
 
 
 
 
08547af
0b4cc63
 
 
08547af
0b4cc63
 
 
 
 
 
 
 
 
e69a6ee
0b4cc63
7f301db
 
 
 
0b4cc63
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
tags:
- text-to-image
- stable-diffusion
- lora
- diffusers
- template:sd-lora
base_model: stabilityai/stable-diffusion-xl-base-1.0
license: cc-by-nc-nd-4.0
---
# ⚡ FlashDiffusion: FlashSDXL ⚡


Flash Diffusion is a diffusion distillation method proposed in [ADD ARXIV]() *by Clément Chadebec, Onur Tasar, Eyal Benaroche, and Benjamin Aubin.*
This model is a **108M** LoRA distilled version of [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model that is able to generate images in **4 steps**. The main purpose of this model is to reproduce the main results of the paper.


<p align="center">
   <img style="width:700px;" src="images/flash_sdxl.jpg">
</p>

# How to use?

The model can be used using the `StableDiffusionPipeline` from `diffusers` library directly. It can allow reducing the number of required sampling steps to **2-4 steps**.

```python
from diffusers import DiffusionPipeline, LCMScheduler

adapter_id = "jasperai/flash-sdxl"

pipe = DiffusionPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  use_safetensors=True,
)

pipe.scheduler = LCMScheduler.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  subfolder="scheduler",
  timestep_spacing="trailing",
)
pipe.to("cuda")

# Fuse and load LoRA weights
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()

prompt = "A raccoon reading a book in a lush forest."

image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]
```
<p align="center">
   <img style="width:400px;" src="images/raccoon.png">
</p>

# Training Details
The model was trained for 20k iterations on 4 H100 GPUs (representing approximately a total of 176 GPU hours of training). Please refer to the [paper]() for further parameters details. 

**Metrics on COCO 2014 validation (Table 3)**
  - FID-10k: 21.62 (4 NFE)
  - CLIP Score: 0.327 (4 NFE)
   
## License
This model is released under the the Creative Commons BY-NC license.