File size: 5,499 Bytes
ef25943
 
 
 
 
 
 
 
 
 
 
 
e677eaf
ef25943
6772976
ef25943
55978e9
e677eaf
 
 
 
 
 
 
 
55978e9
e677eaf
e3473c0
e677eaf
 
 
 
 
 
 
 
 
 
 
 
 
275e335
e677eaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
275e335
 
 
e677eaf
 
 
 
 
 
 
 
 
 
55978e9
e677eaf
e3473c0
e677eaf
 
 
 
 
 
 
 
 
 
 
 
 
275e335
e677eaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
275e335
 
 
e677eaf
 
 
 
 
 
 
 
 
 
55978e9
e677eaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55978e9
 
 
 
6772976
 
a21570a
55978e9
6772976
55978e9
6772976
55978e9
a21570a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
license: apache-2.0
library_name: diffusers
tags:
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
- text-to-image
- diffusers
- controlnet
- diffusers-training
---

# SDXL ControlNet: DWPose

Here are the ControlNet weights trained on [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) with [DWPose](https://github.com/IDEA-Research/DWPose) conditioning.

## Using in 🧨 diffusers

First, install all the libraries:

```bash
pip install -q easy-dwpose transformers accelerate
pip install -q git+https://github.com/huggingface/diffusers
```

### Example 1

To generate a realistic DJ with the following image driving the pose:

![Pose Image 1](./images/pose_image_1.png)

Run the following code:

```python
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
import torch
from diffusers.utils import load_image

from easy_dwpose import DWposeDetector


pose_image = load_image("./images/pose_image_1.png")

# Load detector
device = "cuda:0" if torch.cuda.is_available() else "cpu"
dwpose = DWposeDetector(device=device)

# Compute DWpose conditioning image.
skeleton = dwpose(
	pose_image,
	detect_resolution=pose_image.width,
	output_type="pil",
	include_hands=True,
	include_face=True,
)

# Initialize ControlNet pipeline.
controlnet = ControlNetModel.from_pretrained(
	"dimitribarbot/controlnet-dwpose-sdxl-1.0",
	torch_dtype=torch.float16,
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	controlnet=controlnet,
	torch_dtype=torch.float16,
	variant="fp16",
).to(device)

# Infer.
prompt = "DJ in a party, shallow depth of field, highly detailed, high budget, gorgeous"
negative_prompt = "bad quality, blur, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
image = pipe(
	prompt,
	negative_prompt=negative_prompt,
	num_inference_steps=50,
	guidance_scale=5,
	image=skeleton,
	generator=torch.manual_seed(97),
).images[0]

skeleton.save("./images/dwpose_1.png")
image.save("./images/dwpose_image_1.png")
```

Generated pose is:

![Pose 1](./images/dwpose_1.png)

Image generated by SDXL is:

![Pose 1](./images/dwpose_image_1.png)

### Example 2

To generate a anime version of a woman sitting on a bench with the following image driving the pose:

![Pose Image 2](./images/pose_image_2.png)

Run the following code: 

```python
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
import torch
from diffusers.utils import load_image

from easy_dwpose import DWposeDetector


pose_image = load_image("./images/pose_image_2.png")

# Load detector
device = "cuda:0" if torch.cuda.is_available() else "cpu"
dwpose = DWposeDetector(device=device)

# Compute DWpose conditioning image.
skeleton = dwpose(
	pose_image,
	detect_resolution=pose_image.width,
	output_type="pil",
	include_hands=True,
	include_face=True,
)

# Initialize ControlNet pipeline.
controlnet = ControlNetModel.from_pretrained(
	"dimitribarbot/controlnet-dwpose-sdxl-1.0",
	torch_dtype=torch.float16,
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	controlnet=controlnet,
	torch_dtype=torch.float16,
	variant="fp16",
)
if torch.cuda.is_available():
	pipe.to(torch.device("cuda"))

# Infer.
prompt = "Anime girl sitting on a bench, highly detailed, noon, ambiant light"
negative_prompt = "bad quality, blur, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
image = pipe(
	prompt,
	negative_prompt=negative_prompt,
	num_inference_steps=25,
	guidance_scale=18,
	image=skeleton,
	generator=torch.manual_seed(79),
).images[0]

skeleton.save("./images/dwpose_2.png")
image.save("./images/dwpose_image_2.png")
```

Generated pose is:

![Pose 2](./images/dwpose_2.png)

Image generated by SDXL is:

![Pose 2](./images/dwpose_image_2.png)

## Training

The [training script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md) by HF🤗 was used. 

#### Training data
This checkpoint was trained for 15,000 steps on the [dimitribarbot/dw_pose_controlnet](https://huggingface.co/datasets/dimitribarbot/dw_pose_controlnet) dataset with a resolution of 1024.

#### Compute
One 1xA40 machine (during 48 hours)

#### Batch size
Data parallel with a single GPU batch size of 2 with gradient accumulation 8.

#### Hyper Parameters
Constant learning rate of 8e-5

#### Mixed precision
fp16

## Thanks

[StabilityAI SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0): for the SDXL model.

[IDEA Research DWPose](https://github.com/IDEA-Research/DWPose): for the DWPose model.

[Hugging Face](https://huggingface.co): for the ControlNet training script 🤗 and libraries.

[raulc0399](https://huggingface.co/raulc0399): for highly inspiring me with the creation of the [DWpose dataset](https://huggingface.co/datasets/dimitribarbot/dw_pose_controlnet) based on the [Openpose dataset](https://huggingface.co/datasets/raulc0399/open_pose_controlnet).

[thibaud](https://huggingface.co/thibaud): for highly inspiring me with the hyper parameters of the HF training script, based on the [Openpose ControlNet](https://huggingface.co/thibaud/controlnet-openpose-sdxl-1.0).

[RedHash](https://huggingface.co/RedHash): for the [easy_dwpose](https://github.com/reallyigor/easy_dwpose) module, which highly simplifies the DWPose inference and which I used in the examples above.