Update README.md
Browse files
README.md
CHANGED
@@ -9,4 +9,129 @@ tags:
|
|
9 |
- text-to-image
|
10 |
- stable-diffusion-xl
|
11 |
- stable-diffusion-xl-diffusers
|
12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
- text-to-image
|
10 |
- stable-diffusion-xl
|
11 |
- stable-diffusion-xl-diffusers
|
12 |
+
---
|
13 |
+
# Stable-fast-xl Model Card
|
14 |
+
|
15 |
+
Stable-fast is an ultra lightweight inference optimization framework for HuggingFace Diffusers on NVIDIA GPUs. stable-fast provides super fast inference optimization by utilizing some key techniques.
|
16 |
+
this repository contains a compact installation of the stable-fast compiler https://github.com/chengzeyi/stable-fast and its inference with the stable-diffusion-xl-base-1.0
|
17 |
+
Inference with [stable-diffusion-xl-base-1.0)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable-diffusion-xl-1.0-inpainting-0.1](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1)
|
18 |
+
|
19 |
+
![image.png](https://cdn-uploads.huggingface.co/production/uploads/670503434c094132b2282e63/Xib4SHo9PX7-oSWP3Or3Y.png)
|
20 |
+
|
21 |
+
![image.png](https://cdn-uploads.huggingface.co/production/uploads/670503434c094132b2282e63/-a7V70NkS09TeMSZAKgVB.png)
|
22 |
+
|
23 |
+
# iinference sdxl model 30%+ faster!!!
|
24 |
+
|
25 |
+
## Differences With Other Acceleration Libraries
|
26 |
+
#### Fast:
|
27 |
+
stable-fast is specialy optimized for HuggingFace Diffusers. It achieves a high performance across many libraries. And it provides a very fast compilation speed within only a few seconds. It is significantly faster than **torch.compile**, **TensorRT** and **AITemplate** in compilation time.
|
28 |
+
#### Minimal:
|
29 |
+
stable-fast works as a plugin framework for **PyTorch**. It utilizes existing PyTorch functionality and infrastructures and is compatible with other acceleration techniques, as well as popular fine-tuning techniques and deployment solutions.
|
30 |
+
|
31 |
+
|
32 |
+
# How to use
|
33 |
+
|
34 |
+
### Install dependencies
|
35 |
+
```bash
|
36 |
+
pip install diffusers, transformers, safetensors, accelerate, sentencepiece
|
37 |
+
```
|
38 |
+
|
39 |
+
### Download repository and run script for stable-fast installation
|
40 |
+
```bash
|
41 |
+
git clone https://huggingface.co/artemtumch/stable-fast-xl
|
42 |
+
cd stable-fast-xl
|
43 |
+
sh install_stable-fast.sh
|
44 |
+
```
|
45 |
+
|
46 |
+
## Generate image
|
47 |
+
```py
|
48 |
+
from diffusers import DiffusionPipeline
|
49 |
+
import torch
|
50 |
+
|
51 |
+
from sfast.compilers.stable_diffusion_pipeline_compiler import (
|
52 |
+
compile, CompilationConfig
|
53 |
+
)
|
54 |
+
|
55 |
+
import xformers
|
56 |
+
import triton
|
57 |
+
|
58 |
+
pipe = DiffusionPipeline.from_pretrained(
|
59 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
60 |
+
torch_dtype=torch.float16,
|
61 |
+
use_safetensors=True,
|
62 |
+
variant="fp16"
|
63 |
+
)
|
64 |
+
|
65 |
+
# enable to reduce GPU VRAM usage (~30%)
|
66 |
+
# pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
|
67 |
+
|
68 |
+
pipe.to("cuda")
|
69 |
+
|
70 |
+
# if using torch < 2.0
|
71 |
+
# pipe.enable_xformers_memory_efficient_attention()
|
72 |
+
|
73 |
+
config = CompilationConfig.Default()
|
74 |
+
|
75 |
+
config.enable_xformers = True
|
76 |
+
config.enable_triton = True
|
77 |
+
config.enable_cuda_graph = True
|
78 |
+
|
79 |
+
pipe = compile(pipe, config)
|
80 |
+
|
81 |
+
prompt = "An astronaut riding a green horse"
|
82 |
+
|
83 |
+
images = pipe(prompt=prompt).images[0]
|
84 |
+
```
|
85 |
+
|
86 |
+
## Inpainting
|
87 |
+
```py
|
88 |
+
from diffusers import StableDiffusionXLInpaintPipeline
|
89 |
+
from diffusers.utils import load_image
|
90 |
+
import torch
|
91 |
+
|
92 |
+
from sfast.compilers.stable_diffusion_pipeline_compiler import (
|
93 |
+
compile, CompilationConfig
|
94 |
+
)
|
95 |
+
|
96 |
+
import xformers
|
97 |
+
import triton
|
98 |
+
|
99 |
+
pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
|
100 |
+
"diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
|
101 |
+
torch_dtype=torch.float16,
|
102 |
+
variant="fp16"
|
103 |
+
)
|
104 |
+
|
105 |
+
# enable to reduce GPU VRAM usage (~30%)
|
106 |
+
# pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
|
107 |
+
|
108 |
+
pipe.to("cuda")
|
109 |
+
|
110 |
+
config = CompilationConfig.Default()
|
111 |
+
|
112 |
+
config.enable_xformers = True
|
113 |
+
config.enable_triton = True
|
114 |
+
config.enable_cuda_graph = True
|
115 |
+
|
116 |
+
pipe = compile(pipe, config)
|
117 |
+
|
118 |
+
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
|
119 |
+
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
|
120 |
+
|
121 |
+
image = load_image(img_url).resize((1024, 1024))
|
122 |
+
mask_image = load_image(mask_url).resize((1024, 1024))
|
123 |
+
|
124 |
+
prompt = "a tiger sitting on a park bench"
|
125 |
+
generator = torch.Generator(device="cuda").manual_seed(0)
|
126 |
+
|
127 |
+
image = pipe(
|
128 |
+
prompt=prompt,
|
129 |
+
image=image,
|
130 |
+
mask_image=mask_image,
|
131 |
+
guidance_scale=8.0,
|
132 |
+
num_inference_steps=20, # steps between 15 and 30 work well
|
133 |
+
strength=0.99, # make sure to use `strength` below 1.0
|
134 |
+
generator=generator,
|
135 |
+
).images[0]
|
136 |
+
|
137 |
+
```
|