tlwu commited on
Commit
12ddde0
·
1 Parent(s): 0987186

update doc about Olive

Browse files
Files changed (1) hide show
  1. README.md +12 -36
README.md CHANGED
@@ -2,7 +2,7 @@
2
  pipeline_tag: text-to-image
3
  license: other
4
  license_name: sai-nc-community
5
- license_link: https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.TXT
6
  base_model: stabilityai/sdxl-turbo
7
  language:
8
  - en
@@ -18,7 +18,10 @@ tags:
18
 
19
  ## Introduction
20
 
21
- This repository hosts the optimized versions of **SDXL Turbo** to accelerate inference with ONNX Runtime CUDA execution provider.
 
 
 
22
 
23
  See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository.
24
 
@@ -39,28 +42,17 @@ The Canny control net is converted from [diffusers/controlnet-canny-sdxl-1.0](ht
39
 
40
  Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:
41
 
42
- | Engine | Batch Size | Steps | PyTorch 2.1 | ONNX Runtime CUDA |
43
  |-------------|------------|------ | ----------------|-------------------|
44
- | Static | 1 | 1 | 109.4 ms | 43.9 ms |
45
- | Static | 4 | 1 | 247.0 ms | 121.1 ms |
46
- | Static | 1 | 4 | 171.1 ms | 97.5 ms |
47
- | Static | 4 | 4 | 390.5 ms | 248.0 ms |
48
 
49
 
50
  Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.
51
- For PyTorch 2.1, the UNet use channel last (NHWC) format, and compile the UNet with mode `reduce-overhead`. See [benchmark script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark_controlnet.py) for detail.
52
-
53
- #### Latency for SDXL-Turbo with Canny Control Net
54
-
55
- Below is average latency of generating an image of size 512x512 with canny control net using NVIDIA A100-SXM4-80GB GPU:
56
-
57
- | Engine | Batch Size | Steps | PyTorch 2.1 | ONNX Runtime CUDA |
58
- |-------------|------------|------ | ----------------|-------------------|
59
- | Static | 1 | 1 | 160.0 ms | 55.3 ms |
60
- | Static | 4 | 1 | 314.9 ms | 144.4 ms |
61
- | Static | 1 | 4 | 251.9 ms | 134.9 ms |
62
- | Static | 4 | 4 | 514.2 ms | 332.6 ms |
63
 
 
64
 
65
  ## Usage Example
66
 
@@ -80,11 +72,6 @@ git lfs install
80
  git clone https://huggingface.co/tlwu/sdxl-turbo-onnxruntime
81
  ```
82
 
83
- If you want to try canny control net, get model from a branch:
84
- ```shell
85
- git checkout canny_control_net
86
- ```
87
-
88
  3. Launch the docker
89
  ```shell
90
  docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.10-py3 /bin/bash
@@ -118,16 +105,5 @@ python3 -m pip install --upgrade polygraphy onnx-graphsurgeon --extra-index-url
118
  python3 demo_txt2img_xl.py \
119
  "starry night over Golden Gate Bridge by van gogh" \
120
  --version xl-turbo \
121
- --work-dir /workspace/sdxl-turbo-onnxruntime
122
- ```
123
-
124
- Generate an image using the canny control net:
125
-
126
- ```shell
127
- wget https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png
128
-
129
- python3 demo_txt2img_xl.py --controlnet-type canny --controlnet-scale 0.5 --controlnet-image input_image_vermeer.png \
130
- --version xl-turbo --height 1024 --width 1024 \
131
- --work-dir /workspace/sdxl-turbo-onnxruntime \
132
- "portrait of Mona Lisa with mysterious mysterious smile and mountain, river and forest in the background"
133
  ```
 
2
  pipeline_tag: text-to-image
3
  license: other
4
  license_name: sai-nc-community
5
+ license_link: https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.TXT
6
  base_model: stabilityai/sdxl-turbo
7
  language:
8
  - en
 
18
 
19
  ## Introduction
20
 
21
+ This repository hosts the optimized versions of **SDXL Turbo** to accelerate inference with ONNX Runtime CUDA execution provider. The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
22
+ ```
23
+ python stable_diffusion_xl.py --provider cuda --model_id stabilityai/sdxl-turbo --optimize --use_fp16_fixed_vae
24
+ ```
25
 
26
  See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository.
27
 
 
42
 
43
  Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:
44
 
45
+ | Engine | Batch Size | Steps | PyTorch 2.1 + Diffusers | ONNX Runtime Demo |
46
  |-------------|------------|------ | ----------------|-------------------|
47
+ | Static | 1 | 1 | 109.4 ms | 49.5 ms |
48
+ | Static | 4 | 1 | 247.0 ms | 143.1 ms |
49
+ | Static | 1 | 4 | 171.1 ms | 104.1 ms |
50
+ | Static | 4 | 4 | 390.5 ms | 271.69 ms |
51
 
52
 
53
  Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ For PyTorch 2.1, the UNet use channel last (NHWC) format, and compile the UNet with mode `reduce-overhead`. See [benchmark script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark_controlnet.py) for detail.
56
 
57
  ## Usage Example
58
 
 
72
  git clone https://huggingface.co/tlwu/sdxl-turbo-onnxruntime
73
  ```
74
 
 
 
 
 
 
75
  3. Launch the docker
76
  ```shell
77
  docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.10-py3 /bin/bash
 
105
  python3 demo_txt2img_xl.py \
106
  "starry night over Golden Gate Bridge by van gogh" \
107
  --version xl-turbo \
108
+ --engine-dir /workspace/sdxl-turbo-onnxruntime
 
 
 
 
 
 
 
 
 
 
 
109
  ```