awoo
Browse filesSigned-off-by: Balazs Horvath <[email protected]>
README.md
CHANGED
@@ -30,11 +30,17 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
|
|
30 |
- [Pony Training](#pony-training)
|
31 |
- [Download Pony in Diffusers Format](#download-pony-in-diffusers-format)
|
32 |
- [Sample Prompt File](#sample-prompt-file)
|
|
|
|
|
|
|
|
|
|
|
33 |
- [`--dataset_repeats`](#--dataset_repeats)
|
34 |
- [`--max_train_steps`](#--max_train_steps)
|
35 |
- [`--shuffle_caption`](#--shuffle_caption)
|
36 |
- [`--sdpa`](#--sdpa)
|
37 |
-
- [`--sample_sampler`](#--sample_sampler)
|
|
|
38 |
- [Embeddings for 1.5 and SDXL](#embeddings-for-15-and-sdxl)
|
39 |
- [ComfyUI Walkthrough any%](#comfyui-walkthrough-any)
|
40 |
- [AnimateDiff for Masochists](#animatediff-for-masochists)
|
@@ -102,17 +108,20 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
|
|
102 |
|
103 |
### Installation Tips
|
104 |
|
105 |
-
|
106 |
|
107 |
-
|
108 |
-
For training you will have to go with either `--sdpa` or `--xformers`
|
109 |
|
110 |
### Dataset Preparation
|
111 |
|
|
|
|
|
112 |
⚠️ **TODO:** Awoo this section.
|
113 |
|
114 |
### Pony Training
|
115 |
|
|
|
|
|
116 |
I'm not going to lie, it is a bit complicated to explain everything. But here is my best attempt going through some "basic" stuff and almost all lines in order.
|
117 |
|
118 |
#### Download Pony in Diffusers Format
|
@@ -125,7 +134,7 @@ git clone https://huggingface.co/k4d3/ponydiffusers
|
|
125 |
|
126 |
#### Sample Prompt File
|
127 |
|
128 |
-
A sample prompt file is used during training to sample images. A sample prompt for example might look like this for Pony
|
129 |
|
130 |
```py
|
131 |
# anthro female kindred
|
@@ -136,6 +145,42 @@ score_9, score_8_up, score_7_up, score_6_up, rating_explicit, source_furry, solo
|
|
136 |
score_9, score_8_up, score_7_up, score_6_up, rating_explicit, source_furry, solo, anthro male fox, glowing yellow eyes, night, crescent moon, tibetan necklace, gold bracers, blue and gold adorned loincloth, canine genitalia, knot, amazing_background, scenery porn, white marble ruins in the background, realistic, photo, photo (medium), photography (artwork) --n low quality, worst quality --w 1024 --h 1024 --d 1 --l 6.0 --s 40
|
137 |
```
|
138 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
139 |
#### `--dataset_repeats`
|
140 |
|
141 |
Repeats the dataset when training with captions, by default it is set to `1` so we'll set this to `0` with:
|
@@ -164,24 +209,32 @@ As you can tell, I have separated the caption part not just the tags with a `,`
|
|
164 |
|
165 |
The choice between `--xformers` and `--spda` will depend on your GPU. You can benchmark it by repeating a training with both!
|
166 |
|
167 |
-
#### `--sample_sampler`
|
168 |
|
169 |
You have the option of generating images during training so you can check the progress, the argument let's you pick between different samplers, by default it is on `ddim`, so you better change it!
|
170 |
|
171 |
You can also use `--sample_every_n_epochs` instead which will take precedence over steps. The `k_` prefix means karras and the `_a` suffix means ancestral.
|
172 |
|
173 |
```py
|
|
|
174 |
--sample_sampler="euler_a" \
|
175 |
--sample_every_n_steps=100
|
176 |
```
|
177 |
|
178 |
My recommendation for Pony is to use `euler_a` for toony and for realistic `k_dpm_2`.
|
179 |
-
|
|
|
180 |
|
181 |
```bash
|
182 |
ddim, pndm, lms, euler, euler_a, heun, dpm_2, dpm_2_a, dpmsolver, dpmsolver++, dpmsingle, k_lms, k_euler, k_euler_a, k_dpm_2, k_dpm_2_a
|
183 |
```
|
184 |
|
|
|
|
|
|
|
|
|
|
|
|
|
185 |
## Embeddings for 1.5 and SDXL
|
186 |
|
187 |
Embeddings in Stable Diffusion are high-dimensional representations of input data, such as images or text, that capture their essential features and relationships. These embeddings are used to guide the diffusion process, enabling the model to generate outputs that closely match the desired characteristics specified in the input.
|
|
|
30 |
- [Pony Training](#pony-training)
|
31 |
- [Download Pony in Diffusers Format](#download-pony-in-diffusers-format)
|
32 |
- [Sample Prompt File](#sample-prompt-file)
|
33 |
+
- [`--lowram`](#--lowram)
|
34 |
+
- [`--pretrained_model_name_or_path`](#--pretrained_model_name_or_path)
|
35 |
+
- [`--train_data_dir`](#--train_data_dir)
|
36 |
+
- [`--resolution`](#--resolution)
|
37 |
+
- [`--optimizer_type`](#--optimizer_type)
|
38 |
- [`--dataset_repeats`](#--dataset_repeats)
|
39 |
- [`--max_train_steps`](#--max_train_steps)
|
40 |
- [`--shuffle_caption`](#--shuffle_caption)
|
41 |
- [`--sdpa`](#--sdpa)
|
42 |
+
- [`--sample_prompts --sample_sampler --sample_every_n_steps`](#--sample_prompts---sample_sampler---sample_every_n_steps)
|
43 |
+
- [CosXL Training](#cosxl-training)
|
44 |
- [Embeddings for 1.5 and SDXL](#embeddings-for-15-and-sdxl)
|
45 |
- [ComfyUI Walkthrough any%](#comfyui-walkthrough-any)
|
46 |
- [AnimateDiff for Masochists](#animatediff-for-masochists)
|
|
|
108 |
|
109 |
### Installation Tips
|
110 |
|
111 |
+
---
|
112 |
|
113 |
+
Firstly, download kohya_ss' [sd-scripts](https://github.com/kohya-ss/sd-scripts), you need to set up your environment either like [this](https://github.com/kohya-ss/sd-scripts?tab=readme-ov-file#windows-installation) tells you for Windows, or if you are using Linux or Miniconda on Windows, you are probably smart enough to figure out the installation for it. I recommend always installing the latest [PyTorch](https://pytorch.org/get-started/locally/) in the virtual environment you are going to use, which at the time of writing is `2.2.2`. I hope future me has faster PyTorch!
|
|
|
114 |
|
115 |
### Dataset Preparation
|
116 |
|
117 |
+
---
|
118 |
+
|
119 |
⚠️ **TODO:** Awoo this section.
|
120 |
|
121 |
### Pony Training
|
122 |
|
123 |
+
---
|
124 |
+
|
125 |
I'm not going to lie, it is a bit complicated to explain everything. But here is my best attempt going through some "basic" stuff and almost all lines in order.
|
126 |
|
127 |
#### Download Pony in Diffusers Format
|
|
|
134 |
|
135 |
#### Sample Prompt File
|
136 |
|
137 |
+
A sample prompt file is used during training to sample images. A sample prompt for example might look like this for Pony:
|
138 |
|
139 |
```py
|
140 |
# anthro female kindred
|
|
|
145 |
score_9, score_8_up, score_7_up, score_6_up, rating_explicit, source_furry, solo, anthro male fox, glowing yellow eyes, night, crescent moon, tibetan necklace, gold bracers, blue and gold adorned loincloth, canine genitalia, knot, amazing_background, scenery porn, white marble ruins in the background, realistic, photo, photo (medium), photography (artwork) --n low quality, worst quality --w 1024 --h 1024 --d 1 --l 6.0 --s 40
|
146 |
```
|
147 |
|
148 |
+
#### `--lowram`
|
149 |
+
|
150 |
+
If you are running running out of RAM like I do with 2 GPUs and a really fat model, this option will help you save a bit of it and might get you out of OOM hell.
|
151 |
+
|
152 |
+
#### `--pretrained_model_name_or_path`
|
153 |
+
|
154 |
+
The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local model with a `/`.
|
155 |
+
|
156 |
+
```py
|
157 |
+
--pretrained_model_name_or_path="/ponydiffusers/" \
|
158 |
+
```
|
159 |
+
|
160 |
+
#### `--train_data_dir`
|
161 |
+
|
162 |
+
The directory containing the dataset. We prepared this earlier together.
|
163 |
+
|
164 |
+
```py
|
165 |
+
--train_data_dir="/training_dir" \
|
166 |
+
```
|
167 |
+
|
168 |
+
#### `--resolution`
|
169 |
+
|
170 |
+
Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to `512,512` as a last resort.
|
171 |
+
|
172 |
+
```py
|
173 |
+
--resolution="512,512" \
|
174 |
+
```
|
175 |
+
|
176 |
+
#### `--optimizer_type`
|
177 |
+
|
178 |
+
The default optimizer is `AdamW` and there are a bunch of them added every month or so, therefore I'm not listing it. You can find the list if you really want. But `AdamW` is the best as of this writing so we use that!
|
179 |
+
|
180 |
+
```py
|
181 |
+
--optimizer_type="AdamW" \
|
182 |
+
```
|
183 |
+
|
184 |
#### `--dataset_repeats`
|
185 |
|
186 |
Repeats the dataset when training with captions, by default it is set to `1` so we'll set this to `0` with:
|
|
|
209 |
|
210 |
The choice between `--xformers` and `--spda` will depend on your GPU. You can benchmark it by repeating a training with both!
|
211 |
|
212 |
+
#### `--sample_prompts --sample_sampler --sample_every_n_steps`
|
213 |
|
214 |
You have the option of generating images during training so you can check the progress, the argument let's you pick between different samplers, by default it is on `ddim`, so you better change it!
|
215 |
|
216 |
You can also use `--sample_every_n_epochs` instead which will take precedence over steps. The `k_` prefix means karras and the `_a` suffix means ancestral.
|
217 |
|
218 |
```py
|
219 |
+
--sample_prompts=/training_dir/sample-prompts.txt
|
220 |
--sample_sampler="euler_a" \
|
221 |
--sample_every_n_steps=100
|
222 |
```
|
223 |
|
224 |
My recommendation for Pony is to use `euler_a` for toony and for realistic `k_dpm_2`.
|
225 |
+
|
226 |
+
Your sampler options include the following:
|
227 |
|
228 |
```bash
|
229 |
ddim, pndm, lms, euler, euler_a, heun, dpm_2, dpm_2_a, dpmsolver, dpmsolver++, dpmsingle, k_lms, k_euler, k_euler_a, k_dpm_2, k_dpm_2_a
|
230 |
```
|
231 |
|
232 |
+
### CosXL Training
|
233 |
+
|
234 |
+
The only difference between CosXL training is that you need to enable `--v_parameterization`, and you can't sample the images. 😹 I also don't recommend using the `block_dims` and `block_alphas` from Pony.
|
235 |
+
|
236 |
+
---
|
237 |
+
|
238 |
## Embeddings for 1.5 and SDXL
|
239 |
|
240 |
Embeddings in Stable Diffusion are high-dimensional representations of input data, such as images or text, that capture their essential features and relationships. These embeddings are used to guide the diffusion process, enabling the model to generate outputs that closely match the desired characteristics specified in the input.
|