artemtumch commited on
Commit
722c0cd
·
verified ·
1 Parent(s): 6219d37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -1
README.md CHANGED
@@ -9,4 +9,129 @@ tags:
9
  - text-to-image
10
  - stable-diffusion-xl
11
  - stable-diffusion-xl-diffusers
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - text-to-image
10
  - stable-diffusion-xl
11
  - stable-diffusion-xl-diffusers
12
+ ---
13
+ # Stable-fast-xl Model Card
14
+
15
+ Stable-fast is an ultra lightweight inference optimization framework for HuggingFace Diffusers on NVIDIA GPUs. stable-fast provides super fast inference optimization by utilizing some key techniques.
16
+ this repository contains a compact installation of the stable-fast compiler https://github.com/chengzeyi/stable-fast and its inference with the stable-diffusion-xl-base-1.0
17
+ Inference with [stable-diffusion-xl-base-1.0)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable-diffusion-xl-1.0-inpainting-0.1](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1)
18
+
19
+ ![image.png](https://cdn-uploads.huggingface.co/production/uploads/670503434c094132b2282e63/Xib4SHo9PX7-oSWP3Or3Y.png)
20
+
21
+ ![image.png](https://cdn-uploads.huggingface.co/production/uploads/670503434c094132b2282e63/-a7V70NkS09TeMSZAKgVB.png)
22
+
23
+ # iinference sdxl model 30%+ faster!!!
24
+
25
+ ## Differences With Other Acceleration Libraries
26
+ #### Fast:
27
+ stable-fast is specialy optimized for HuggingFace Diffusers. It achieves a high performance across many libraries. And it provides a very fast compilation speed within only a few seconds. It is significantly faster than **torch.compile**, **TensorRT** and **AITemplate** in compilation time.
28
+ #### Minimal:
29
+ stable-fast works as a plugin framework for **PyTorch**. It utilizes existing PyTorch functionality and infrastructures and is compatible with other acceleration techniques, as well as popular fine-tuning techniques and deployment solutions.
30
+
31
+
32
+ # How to use
33
+
34
+ ### Install dependencies
35
+ ```bash
36
+ pip install diffusers, transformers, safetensors, accelerate, sentencepiece
37
+ ```
38
+
39
+ ### Download repository and run script for stable-fast installation
40
+ ```bash
41
+ git clone https://huggingface.co/artemtumch/stable-fast-xl
42
+ cd stable-fast-xl
43
+ sh install_stable-fast.sh
44
+ ```
45
+
46
+ ## Generate image
47
+ ```py
48
+ from diffusers import DiffusionPipeline
49
+ import torch
50
+
51
+ from sfast.compilers.stable_diffusion_pipeline_compiler import (
52
+ compile, CompilationConfig
53
+ )
54
+
55
+ import xformers
56
+ import triton
57
+
58
+ pipe = DiffusionPipeline.from_pretrained(
59
+ "stabilityai/stable-diffusion-xl-base-1.0",
60
+ torch_dtype=torch.float16,
61
+ use_safetensors=True,
62
+ variant="fp16"
63
+ )
64
+
65
+ # enable to reduce GPU VRAM usage (~30%)
66
+ # pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
67
+
68
+ pipe.to("cuda")
69
+
70
+ # if using torch < 2.0
71
+ # pipe.enable_xformers_memory_efficient_attention()
72
+
73
+ config = CompilationConfig.Default()
74
+
75
+ config.enable_xformers = True
76
+ config.enable_triton = True
77
+ config.enable_cuda_graph = True
78
+
79
+ pipe = compile(pipe, config)
80
+
81
+ prompt = "An astronaut riding a green horse"
82
+
83
+ images = pipe(prompt=prompt).images[0]
84
+ ```
85
+
86
+ ## Inpainting
87
+ ```py
88
+ from diffusers import StableDiffusionXLInpaintPipeline
89
+ from diffusers.utils import load_image
90
+ import torch
91
+
92
+ from sfast.compilers.stable_diffusion_pipeline_compiler import (
93
+ compile, CompilationConfig
94
+ )
95
+
96
+ import xformers
97
+ import triton
98
+
99
+ pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
100
+ "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
101
+ torch_dtype=torch.float16,
102
+ variant="fp16"
103
+ )
104
+
105
+ # enable to reduce GPU VRAM usage (~30%)
106
+ # pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
107
+
108
+ pipe.to("cuda")
109
+
110
+ config = CompilationConfig.Default()
111
+
112
+ config.enable_xformers = True
113
+ config.enable_triton = True
114
+ config.enable_cuda_graph = True
115
+
116
+ pipe = compile(pipe, config)
117
+
118
+ img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
119
+ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
120
+
121
+ image = load_image(img_url).resize((1024, 1024))
122
+ mask_image = load_image(mask_url).resize((1024, 1024))
123
+
124
+ prompt = "a tiger sitting on a park bench"
125
+ generator = torch.Generator(device="cuda").manual_seed(0)
126
+
127
+ image = pipe(
128
+ prompt=prompt,
129
+ image=image,
130
+ mask_image=mask_image,
131
+ guidance_scale=8.0,
132
+ num_inference_steps=20, # steps between 15 and 30 work well
133
+ strength=0.99, # make sure to use `strength` below 1.0
134
+ generator=generator,
135
+ ).images[0]
136
+
137
+ ```