k4d3
/

yiff_toolkit

Diffusers

TensorBoard

English

Not-For-All-Audiences

Model card Files Files and versions Metrics Training metrics Community

k4d3 commited on Apr 24

Commit

c4ba74a

•

1 Parent(s): f5ac613

awoo

Browse files

Signed-off-by: Balazs Horvath <[email protected]>

Files changed (1) hide show

README.md +120 -49

README.md CHANGED Viewed

@@ -81,7 +81,6 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
         - [`--sdpa` or `--xformers` or `--mem_eff_attn`](#--sdpa-or---xformers-or---mem_eff_attn)
         - [`--multires_noise_iterations` and `--multires_noise_discount`](#--multires_noise_iterations-and---multires_noise_discount)
         - [`--sample_prompts` and `--sample_sampler` and `--sample_every_n_steps`](#--sample_prompts-and---sample_sampler-and---sample_every_n_steps)
-    - [CosXL Training](#cosxl-training)
   - [Embeddings for 1.5 and SDXL](#embeddings-for-15-and-sdxl)
   - [ComfyUI Walkthrough any%](#comfyui-walkthrough-any)
   - [AnimateDiff for Masochists](#animatediff-for-masochists)
@@ -444,6 +443,35 @@ for image_filename in image_files:
 Firstly, download kohya_ss' [sd-scripts](https://github.com/kohya-ss/sd-scripts), you need to set up your environment either like [this](https://github.com/kohya-ss/sd-scripts?tab=readme-ov-file#windows-installation) tells you for Windows, or if you are using Linux or Miniconda on Windows, you are probably smart enough to figure out the installation for it. I recommend always installing the latest [PyTorch](https://pytorch.org/get-started/locally/) in the virtual environment you are going to use, which at the time of writing is `2.2.2`. I hope future me has faster PyTorch!
 ---
 ### Pony Training
@@ -475,7 +503,7 @@ score_9, score_8_up, score_7_up, score_6_up, rating_explicit, source_furry, solo
 Please note that sample prompts should not exceed 77 tokens, you can use [Count Tokens in Sample Prompts](https://huggingface.co/k4d3/yiff_toolkit/blob/main/dataset_tools/Count%20Tokens%20in%20Sample%20Prompts.ipynb) from [/dataset_tools](https://huggingface.co/k4d3/yiff_toolkit/tree/main/dataset_tools) to analyze your prompts.
-If you are training with multiple GPUs, ensure that the total number of prompts is divisible by the number of GPUs without any remainder.
 ---
@@ -492,16 +520,21 @@ If you are training with multiple GPUs, ensure that the total number of prompts
 For two GPUs:
 ```python
-accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 --num_cpu_threads_per_process=2  "./sdxl_train_network.py" \
 ```
 Single GPU:
 ```python
-accelerate launch --num_processes=1 --num_machines=1 --gpu_ids=0 --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
 ```
 ---
 ##### `--lowram`
@@ -514,7 +547,7 @@ If you are running running out of system memory like I do with 2 GPUs and a real
 The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local diffusers model with a `/`. You can also specify a `.safetensors` or `.ckpt` if that is what you have!
 ```python
-    --pretrained_model_name_or_path="/ponydiffusers/" \
 ```
 ---
@@ -524,7 +557,7 @@ The directory containing the checkpoint you just downloaded. I recommend closing
 This is where all the saved epochs or steps will be saved, including the last one. If y
 ```python
-    --output_dir="/output_dir" \
 ```
 ---
@@ -534,7 +567,7 @@ This is where all the saved epochs or steps will be saved, including the last on
 The directory containing the dataset. We prepared this earlier together.
 ```python
-    --train_data_dir="/training_dir" \
 ```
 ---
@@ -544,7 +577,7 @@ The directory containing the dataset. We prepared this earlier together.
 Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to `512,512` as a last resort.
 ```python
-    --resolution="1024,1024" \
 ```
 ---
@@ -553,10 +586,6 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 Creates different buckets by pre-categorizing images with different aspect ratios into different buckets. This technique helps to avoid issues like unnatural crops that are common when models are trained to produce square images. This allows the creation of batches where every item has the same size, but the image size of batches may differ.
-```python
-    --enable_bucket \
-```
 ---
 ##### `--min_bucket_reso` and `--max_bucket_reso`
@@ -564,8 +593,7 @@ Creates different buckets by pre-categorizing images with different aspect ratio
 Specifies the minimum and maximum resolutions used by the buckets. These values are ignored if `--bucket_no_upscale` is set.
 ```python
-    --min_bucket_reso=256 \
-    --max_bucket_reso=1024 \
 ```
 ---
@@ -575,7 +603,7 @@ Specifies the minimum and maximum resolutions used by the buckets. These values
 Specifies how many of the trained Network Ranks are allowed to alter the base model.
 ```python
-    --network_alpha=4 \
 ```
 ---
@@ -585,7 +613,7 @@ Specifies how many of the trained Network Ranks are allowed to alter the base mo
 You can use this to specify either `ckpt` or `safetensors` for the file format.
 ```python
-    --save_model_as=safetensors \
 ```
 ---
@@ -595,7 +623,7 @@ You can use this to specify either `ckpt` or `safetensors` for the file format.
 Specifies which network module you are going to train.
 ```python
-    --network_module=lycoris.kohya \
 ```
 ---
@@ -824,9 +852,7 @@ The learning rate determines how much the weights of the network are updated in
 For AdamW the optimal LR seems to be `0.0001` or `1e-4` if you want to impress your friends.
 ```py
-    --learning_rate=0.0001 \
-    --unet_lr=0.0001 \
-    --text_encoder_lr=0.0001 \
 ```
 ---
@@ -836,7 +862,7 @@ For AdamW the optimal LR seems to be `0.0001` or `1e-4` if you want to impress y
 The Network Rank (Dimension) is responsible for how many features your LoRA will be training. It is in a close relation with Network Alpha and the Unet + TE learning rates and of course the quality of your dataset. Personal experimentation with these values is strongly recommended.
 ```py
-    --network_dim=8 \
 ```
 ---
@@ -848,7 +874,7 @@ Specify the output name excluding the file extension.
 **WARNING**: If for some reason this is ever left empty your last epoch won't be saved!
 ```py
-    --output_name="last" \
 ```
 ---
@@ -860,7 +886,7 @@ Max-norm regularization is a technique that constrains the norm of the incoming
 Dropout affects the network architecture without changing the weights, while Max-Norm Regularization directly modifies the weights of the network. Both techniques are used to prevent overfitting and improve the generalization of the model. You can learn more about both in this [research paper](https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf).
 ```py
-    --scale_weight_norms=1.0 \
 ```
 ---
@@ -870,7 +896,7 @@ Dropout affects the network architecture without changing the weights, while Max
 Also known as Gradient Clipping, if you notice that gradients are exploding during training (loss becomes NaN or very large), consider adjusting the `--max_grad_norm` parameter, it operates on the gradients during the backpropagation process, while `--scale_weight_norms` operates on the weights of the neural network. This allows them to complement each other and provide a more robust approach to stabilizing the learning process and improving model performance.
 ```py
-    --max_grad_norm=1.0 \
 ```
 ---
@@ -879,10 +905,6 @@ Also known as Gradient Clipping, if you notice that gradients are exploding duri
 Disables mixed precision for the SDXL VAE and sets it to `float32`. Very useful if you don't like NaNs.
-```py
-    --no_half_vae \
-```
 ---
 ##### `--save_every_n_epochs` and `--save_last_n_epochs` or `--save_every_n_steps` and `--save_last_n_steps`
@@ -893,7 +915,7 @@ Disables mixed precision for the SDXL VAE and sets it to `float32`. Very useful
 Learning will always end with what you specify in `--max_train_epochs` or `--max_train_steps`.
 ```py
-    --save_every_n_epochs=50 \
 ```
 ---
@@ -903,7 +925,7 @@ Learning will always end with what you specify in `--max_train_epochs` or `--max
 ⚠️
 ```py
-    --mixed_precision="fp16" \
 ```
 ---
@@ -913,7 +935,7 @@ Learning will always end with what you specify in `--max_train_epochs` or `--max
 ⚠️
 ```py
-    --save_precision="fp16" \
 ```
 ---
@@ -923,7 +945,7 @@ Learning will always end with what you specify in `--max_train_epochs` or `--max
 ⚠️
 ```py
-    --caption_extension=".txt" \
 ```
 ##### `--cache_latents` and `--cache_latents_to_disk`
@@ -931,8 +953,7 @@ Learning will always end with what you specify in `--max_train_epochs` or `--max
 ⚠️
 ```py
-    --cache_latents \
-    --cache_latents_to_disk \
 ```
 ---
@@ -942,7 +963,7 @@ Learning will always end with what you specify in `--max_train_epochs` or `--max
 The default optimizer is `AdamW` and there are a bunch of them added every month or so, therefore I'm not listing them all, you can find the list if you really want, but `AdamW` is the best as of this writing so we use that!
 ```py
-    --optimizer_type="AdamW" \
 ```
 ---
@@ -952,7 +973,7 @@ The default optimizer is `AdamW` and there are a bunch of them added every month
 Repeats the dataset when training with captions, by default it is set to `1` so we'll set this to `0` with:
 ```py
-    --dataset_repeats=0 \
 ```
 ---
@@ -962,7 +983,7 @@ Repeats the dataset when training with captions, by default it is set to `1` so
 Specify the number of steps or epochs to train. If both `--max_train_steps` and `--max_train_epochs` are specified, the number of epochs takes precedence.
 ```py
-    --max_train_steps=400 \
 ```
 ---
@@ -990,8 +1011,7 @@ The choice between `--xformers` or `--mem_eff_attn` and `--spda` will depend on
 ⚠️
 ```python
-    --multires_noise_iterations=10 \
-    --multires_noise_discount=0.1 \
 ```
 ---
@@ -1003,9 +1023,7 @@ You have the option of generating images during training so you can check the pr
  You can also use `--sample_every_n_epochs` instead which will take precedence over steps. The `k_` prefix means karras and the `_a` suffix means ancestral.
 ```py
-    --sample_prompts=/training_dir/sample-prompts.txt
-    --sample_sampler="euler_a" \
-    --sample_every_n_steps=100
 ```
 My recommendation for Pony is to use `euler_a` for toony and for realistic `k_dpm_2`.
@@ -1021,18 +1039,71 @@ ddim, pndm, lms, euler, euler_a, heun, dpm_2, dpm_2_a, dpmsolver, dpmsolver++, d
 So, the whole thing would look something like this:
 ```python
 ```
 </details>
 </div>
-### CosXL Training
-<!--
-The only difference between CosXL training is that you need to enable `--v_parameterization`, and you can't sample the images. 😹 I also don't recommend using the `block_dims` and `block_alphas` from Pony.
--->
 ---
 ## Embeddings for 1.5 and SDXL

         - [`--sdpa` or `--xformers` or `--mem_eff_attn`](#--sdpa-or---xformers-or---mem_eff_attn)
         - [`--multires_noise_iterations` and `--multires_noise_discount`](#--multires_noise_iterations-and---multires_noise_discount)
         - [`--sample_prompts` and `--sample_sampler` and `--sample_every_n_steps`](#--sample_prompts-and---sample_sampler-and---sample_every_n_steps)
   - [Embeddings for 1.5 and SDXL](#embeddings-for-15-and-sdxl)
   - [ComfyUI Walkthrough any%](#comfyui-walkthrough-any)
   - [AnimateDiff for Masochists](#animatediff-for-masochists)
 Firstly, download kohya_ss' [sd-scripts](https://github.com/kohya-ss/sd-scripts), you need to set up your environment either like [this](https://github.com/kohya-ss/sd-scripts?tab=readme-ov-file#windows-installation) tells you for Windows, or if you are using Linux or Miniconda on Windows, you are probably smart enough to figure out the installation for it. I recommend always installing the latest [PyTorch](https://pytorch.org/get-started/locally/) in the virtual environment you are going to use, which at the time of writing is `2.2.2`. I hope future me has faster PyTorch!
+Ok, just in case you aren't smart enough how to install the sd-scripts under Miniconda for Windows I actually "guided" someone recently, just so I can tell you about it:
+```bash
+# Installing sd-scripts
+git clone https://github.com/kohya-ss/sd-scripts
+cd sd-scripts
+# Creating the conda environment and installing requirements
+conda create -n sdscripts python=3.10.14
+conda activate sdscripts
+conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
+python -m pip install --use-pep517 --upgrade -r requirements.txt
+python -m pip install --use-pep517 lycoris_lora
+accelerate config
+```
+`accelerate config` will ask you a bunch of questions, you need to actually read each one and reply with the truth. In most cases the truth looks like this: `This machine, No distributed training, no, no, no, all, fp16`.
+You might also want to install `xformers` or `bitsandbytes`.
+```bash
+# Installing xformers
+# Use the same command just replace 'xformers' with any other package you may need.
+python -m pip install --use-pep517 xformers
+# Installing bitsandbytes for windows
+python -m pip install --use-pep517 bitsandbytes --index-url=https://jllllll.github.io/bitsandbytes-windows-webui
+```
 ---
 ### Pony Training
 Please note that sample prompts should not exceed 77 tokens, you can use [Count Tokens in Sample Prompts](https://huggingface.co/k4d3/yiff_toolkit/blob/main/dataset_tools/Count%20Tokens%20in%20Sample%20Prompts.ipynb) from [/dataset_tools](https://huggingface.co/k4d3/yiff_toolkit/tree/main/dataset_tools) to analyze your prompts.
+If you are training with multiple GPUs, ensure that the total number of prompts is divisible by the number of GPUs without any remainder or a card will idle.
 ---
 For two GPUs:
 ```python
+accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 --num_cpu_threads_per_process=2  "./sdxl_train_network.py"
 ```
 Single GPU:
 ```python
+accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
 ```
 ---
+&nbsp;
+And now lets break down a bunch of arguments we can pass to `sd-scripts`.
+&nbsp;
 ##### `--lowram`
 The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local diffusers model with a `/`. You can also specify a `.safetensors` or `.ckpt` if that is what you have!
 ```python
+    --pretrained_model_name_or_path="/ponydiffusers/"
 ```
 ---
 This is where all the saved epochs or steps will be saved, including the last one. If y
 ```python
+    --output_dir="/output_dir"
 ```
 ---
 The directory containing the dataset. We prepared this earlier together.
 ```python
+    --train_data_dir="/training_dir"
 ```
 ---
 Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to `512,512` as a last resort.
 ```python
+    --resolution="1024,1024"
 ```
 ---
 Creates different buckets by pre-categorizing images with different aspect ratios into different buckets. This technique helps to avoid issues like unnatural crops that are common when models are trained to produce square images. This allows the creation of batches where every item has the same size, but the image size of batches may differ.
 ---
 ##### `--min_bucket_reso` and `--max_bucket_reso`
 Specifies the minimum and maximum resolutions used by the buckets. These values are ignored if `--bucket_no_upscale` is set.
 ```python
+    --min_bucket_reso=256 --max_bucket_reso=1024
 ```
 ---
 Specifies how many of the trained Network Ranks are allowed to alter the base model.
 ```python
+    --network_alpha=4
 ```
 ---
 You can use this to specify either `ckpt` or `safetensors` for the file format.
 ```python
+    --save_model_as="safetensors"
 ```
 ---
 Specifies which network module you are going to train.
 ```python
+    --network_module="lycoris.kohya"
 ```
 ---
 For AdamW the optimal LR seems to be `0.0001` or `1e-4` if you want to impress your friends.
 ```py
+    --learning_rate=0.0001 --unet_lr=0.0001 --text_encoder_lr=0.0001
 ```
 ---
 The Network Rank (Dimension) is responsible for how many features your LoRA will be training. It is in a close relation with Network Alpha and the Unet + TE learning rates and of course the quality of your dataset. Personal experimentation with these values is strongly recommended.
 ```py
+    --network_dim=8
 ```
 ---
 **WARNING**: If for some reason this is ever left empty your last epoch won't be saved!
 ```py
+    --output_name="last"
 ```
 ---
 Dropout affects the network architecture without changing the weights, while Max-Norm Regularization directly modifies the weights of the network. Both techniques are used to prevent overfitting and improve the generalization of the model. You can learn more about both in this [research paper](https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf).
 ```py
+    --scale_weight_norms=1.0
 ```
 ---
 Also known as Gradient Clipping, if you notice that gradients are exploding during training (loss becomes NaN or very large), consider adjusting the `--max_grad_norm` parameter, it operates on the gradients during the backpropagation process, while `--scale_weight_norms` operates on the weights of the neural network. This allows them to complement each other and provide a more robust approach to stabilizing the learning process and improving model performance.
 ```py
+    --max_grad_norm=1.0
 ```
 ---
 Disables mixed precision for the SDXL VAE and sets it to `float32`. Very useful if you don't like NaNs.
 ---
 ##### `--save_every_n_epochs` and `--save_last_n_epochs` or `--save_every_n_steps` and `--save_last_n_steps`
 Learning will always end with what you specify in `--max_train_epochs` or `--max_train_steps`.
 ```py
+    --save_every_n_epochs=50
 ```
 ---
 ⚠️
 ```py
+    --mixed_precision="fp16"
 ```
 ---
 ⚠️
 ```py
+    --save_precision="fp16"
 ```
 ---
 ⚠️
 ```py
+    --caption_extension=".txt"
 ```
 ##### `--cache_latents` and `--cache_latents_to_disk`
 ⚠️
 ```py
+    --cache_latents --cache_latents_to_disk
 ```
 ---
 The default optimizer is `AdamW` and there are a bunch of them added every month or so, therefore I'm not listing them all, you can find the list if you really want, but `AdamW` is the best as of this writing so we use that!
 ```py
+    --optimizer_type="AdamW"
 ```
 ---
 Repeats the dataset when training with captions, by default it is set to `1` so we'll set this to `0` with:
 ```py
+    --dataset_repeats=0
 ```
 ---
 Specify the number of steps or epochs to train. If both `--max_train_steps` and `--max_train_epochs` are specified, the number of epochs takes precedence.
 ```py
+    --max_train_steps=400
 ```
 ---
 ⚠️
 ```python
+    --multires_noise_iterations=10 --multires_noise_discount=0.1
 ```
 ---
  You can also use `--sample_every_n_epochs` instead which will take precedence over steps. The `k_` prefix means karras and the `_a` suffix means ancestral.
 ```py
+    --sample_prompts=/training_dir/sample-prompts.txt --sample_sampler="euler_a" --sample_every_n_steps=100
 ```
 My recommendation for Pony is to use `euler_a` for toony and for realistic `k_dpm_2`.
 So, the whole thing would look something like this:
 ```python
+accelerate launch --num_cpu_threads_per_process=2  "./sdxl_train_network.py" \
+    --lowram \
+    --pretrained_model_name_or_path="/ponydiffusers/" \
+    --train_data_dir="/training_dir" \
+    --resolution="1024,1024" \
+    --output_dir="/output_dir" \
+    --enable_bucket \
+    --min_bucket_reso=256 \
+    --max_bucket_reso=1024 \
+    --network_alpha=4 \
+    --save_model_as="safetensors" \
+    --network_module="lycoris.kohya" \
+    --network_args \
+               "use_reentrant=False" \
+               "preset=full" \
+               "conv_dim=256" \
+               "conv_alpha=4" \
+               "dropout=None" \
+               "rank_dropout=None" \
+               "module_dropout=None" \
+               "use_tucker=False" \
+               "use_scalar=False" \
+               "rank_dropout_scale=False" \
+               "algo=locon" \
+               "train_norm=False" \
+               "block_dims=8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8" \
+               "block_alphas=0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625" \
+    --network_dropout=0 \
+    --lr_scheduler="cosine" \
+    --learning_rate=0.0001 \
+    --unet_lr=0.0001 \
+    --text_encoder_lr=0.0001 \
+    --network_dim=8 \
+    --output_name="yifftoolkit" \
+    --scale_weight_norms=1 \
+    --no_half_vae \
+    --save_every_n_epochs=50 \
+    --mixed_precision="fp16" \
+    --save_precision="fp16" \
+    --caption_extension=".txt" \
+    --cache_latents \
+    --cache_latents_to_disk \
+    --optimizer_type="AdamW" \
+    --max_grad_norm=1 \
+    --keep_tokens=1 \
+    --max_data_loader_n_workers=8 \
+    --bucket_reso_steps=32 \
+    --multires_noise_iterations=10 \
+    --multires_noise_discount=0.1 \
+    --log_prefix=xl-locon \
+    --gradient_accumulation_steps=12 \
+    --gradient_checkpointing \
+    --train_batch_size=8 \
+    --dataset_repeats=0 \
+    --max_train_steps=400 \
+    --shuffle_caption \
+    --sdpa \
+    --sample_prompts=/training_dir/sample-prompts.txt \
+    --sample_sampler="euler_a" \
+    --sample_every_n_steps=100
 ```
 </details>
 </div>
 ---
 ## Embeddings for 1.5 and SDXL