k4d3
/

yiff_toolkit

Diffusers

TensorBoard

English

Not-For-All-Audiences

Model card Files Files and versions Metrics Training metrics Community

k4d3 commited on Apr 24

Commit

bff67d4

•

1 Parent(s): d94919e

awoo

Browse files

Signed-off-by: Balazs Horvath <[email protected]>

Files changed (1) hide show

README.md +207 -25

README.md CHANGED Viewed

@@ -37,6 +37,7 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
       - [Download Pony in Diffusers Format](#download-pony-in-diffusers-format)
       - [Sample Prompt File](#sample-prompt-file)
       - [Training Commands](#training-commands)
         - [`--lowram`](#--lowram)
         - [`--pretrained_model_name_or_path`](#--pretrained_model_name_or_path)
         - [`--output_dir`](#--output_dir)
@@ -48,6 +49,17 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
         - [`--save_model_as`](#--save_model_as)
         - [`--network_module`](#--network_module)
         - [`--network_args`](#--network_args)
         - [`--network_dropout`](#--network_dropout)
         - [`--lr_scheduler`](#--lr_scheduler)
         - [`--lr_scheduler_num_cycles`](#--lr_scheduler_num_cycles)
@@ -473,6 +485,20 @@ If you are training with multiple GPUs, ensure that the total number of prompts
 <details>
   <summary>Click to reveal training commands.</summary>
 ##### `--lowram`
 If you are running running out of system memory like I do with 2 GPUs and a really fat model that gets loaded into it per GPU, this option will help you save a bit of it and might get you out of OOM hell.
@@ -481,9 +507,9 @@ If you are running running out of system memory like I do with 2 GPUs and a real
 ##### `--pretrained_model_name_or_path`
-The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local model with a `/`.
-```py
     --pretrained_model_name_or_path="/ponydiffusers/" \
 ```
@@ -493,7 +519,7 @@ The directory containing the checkpoint you just downloaded. I recommend closing
 This is where all the saved epochs or steps will be saved, including the last one. If y
-```py
     --output_dir="/output_dir" \
 ```
@@ -503,7 +529,7 @@ This is where all the saved epochs or steps will be saved, including the last on
 The directory containing the dataset. We prepared this earlier together.
-```py
     --train_data_dir="/training_dir" \
 ```
@@ -513,7 +539,7 @@ The directory containing the dataset. We prepared this earlier together.
 Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to `512,512` as a last resort.
-```py
     --resolution="1024,1024" \
 ```
@@ -521,9 +547,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--enable_bucket`
-⚠️
-```py
     --enable_bucket \
 ```
@@ -531,9 +557,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--min_bucket_reso` and `--max_bucket_reso`
-⚠️
-```py
     --min_bucket_reso=256 \
     --max_bucket_reso=1024 \
 ```
@@ -542,9 +568,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--network_alpha`
-⚠️
-```py
     --network_alpha=4 \
 ```
@@ -552,9 +578,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--save_model_as`
-⚠️
-```py
     --save_model_as=safetensors \
 ```
@@ -562,9 +588,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--network_module`
-⚠️
-```py
     --network_module=lycoris.kohya \
 ```
@@ -572,16 +598,17 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--network_args`
-⚠️
-```py
     --network_args \
                "use_reentrant=False" \
                "preset=full" \
                "conv_dim=256" \
                "conv_alpha=4" \
-               "rank_dropout=0" \
-               "module_dropout=0" \
                "use_tucker=False" \
                "use_scalar=False" \
                "rank_dropout_scale=False" \
@@ -591,13 +618,164 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
                "block_alphas=0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625" \
 ```
 ---
 ##### `--network_dropout`
-⚠️
-```py
     --network_dropout=0 \
 ```
@@ -607,7 +785,7 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ⚠️
-```py
     --lr_scheduler="cosine" \
 ```
@@ -655,7 +833,7 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--network_dim`
-⚠️
 ```py
     --network_dim=8 \
@@ -665,7 +843,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 ##### `--output_name`
-⚠️
 ```py
     --output_name="last" \
@@ -677,7 +857,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
 [![An AI generated image.](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)
-Encourages the LoRA to diversify it's training by randomly removing some weights, the network you are training needs to support it though! See [PR#545](https://github.com/kohya-ss/sd-scripts/pull/545) for more details.
 ```py
     --scale_weight_norms=1 \

       - [Download Pony in Diffusers Format](#download-pony-in-diffusers-format)
       - [Sample Prompt File](#sample-prompt-file)
       - [Training Commands](#training-commands)
+        - [`accelerate launch`](#accelerate-launch)
         - [`--lowram`](#--lowram)
         - [`--pretrained_model_name_or_path`](#--pretrained_model_name_or_path)
         - [`--output_dir`](#--output_dir)
         - [`--save_model_as`](#--save_model_as)
         - [`--network_module`](#--network_module)
         - [`--network_args`](#--network_args)
+          - [`use_reentrant`](#use_reentrant)
+          - [`preset`](#preset)
+          - [`conv_dim` and `conv_alpha`](#conv_dim-and-conv_alpha)
+          - [`module_dropout` and `dropout` and `rank_dropout`](#module_dropout-and-dropout-and-rank_dropout)
+          - [`use_tucker`](#use_tucker)
+          - [`use_scalar`](#use_scalar)
+          - [`rank_dropout_scale`](#rank_dropout_scale)
+          - [`algo`](#algo)
+          - [`train_norm`](#train_norm)
+          - [`block_dims`](#block_dims)
+          - [`block_alphas`](#block_alphas)
         - [`--network_dropout`](#--network_dropout)
         - [`--lr_scheduler`](#--lr_scheduler)
         - [`--lr_scheduler_num_cycles`](#--lr_scheduler_num_cycles)
 <details>
   <summary>Click to reveal training commands.</summary>
+##### `accelerate launch`
+For two GPUs:
+```python
+accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 --num_cpu_threads_per_process=2  "./sdxl_train_network.py" \
+```
+Single GPU:
+```python
+accelerate launch --num_processes=1 --num_machines=1 --gpu_ids=0 --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
+```
 ##### `--lowram`
 If you are running running out of system memory like I do with 2 GPUs and a really fat model that gets loaded into it per GPU, this option will help you save a bit of it and might get you out of OOM hell.
 ##### `--pretrained_model_name_or_path`
+The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local diffusers model with a `/`. You can also specify a `.safetensors` or `.ckpt` if that is what you have!
+```python
     --pretrained_model_name_or_path="/ponydiffusers/" \
 ```
 This is where all the saved epochs or steps will be saved, including the last one. If y
+```python
     --output_dir="/output_dir" \
 ```
 The directory containing the dataset. We prepared this earlier together.
+```python
     --train_data_dir="/training_dir" \
 ```
 Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to `512,512` as a last resort.
+```python
     --resolution="1024,1024" \
 ```
 ##### `--enable_bucket`
+Creates different buckets by pre-categorizing images with different aspect ratios into different buckets. This technique helps to avoid issues like unnatural crops that are common when models are trained to produce square images. This allows the creation of batches where every item has the same size, but the image size of batches may differ.
+```python
     --enable_bucket \
 ```
 ##### `--min_bucket_reso` and `--max_bucket_reso`
+Specifies the minimum and maximum resolutions used by the buckets. These values are ignored if `--bucket_no_upscale` is set.
+```python
     --min_bucket_reso=256 \
     --max_bucket_reso=1024 \
 ```
 ##### `--network_alpha`
+Specifies how many of the trained Network Ranks are allowed to alter the base model.
+```python
     --network_alpha=4 \
 ```
 ##### `--save_model_as`
+You can use this to specify either `ckpt` or `safetensors` for the file format.
+```python
     --save_model_as=safetensors \
 ```
 ##### `--network_module`
+Specifies which network module you are going to train.
+```python
     --network_module=lycoris.kohya \
 ```
 ##### `--network_args`
+The arguments passed down to the network.
+```python
     --network_args \
                "use_reentrant=False" \
                "preset=full" \
                "conv_dim=256" \
                "conv_alpha=4" \
+               "dropout=None" \
+               "rank_dropout=None" \
+               "module_dropout=None" \
                "use_tucker=False" \
                "use_scalar=False" \
                "rank_dropout_scale=False" \
                "block_alphas=0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625" \
 ```
+**Let's break it down!**
+---
+###### `use_reentrant`
+- If `use_reentrant=False` is specified, checkpoint will use an implementation that does not require re-entrant autograd. You can learn more about checkpointing [here](https://pytorch.org/docs/stable/checkpoint.html). Note that future versions of PyTorch will default to `use_reentrant=False`, today the default is still `True`, so we set it to `False`. Easy!
+---
+###### `preset`
+The [Preset](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Preset.md)/config system added to LyCORIS for more fine-grained control.
+- `full`
+  - default preset, train all the layers in the UNet and CLIP.
+- `full-lin`
+  - `full` but skip convolutional layers.
+- `attn-mlp`
+  - "kohya preset", train all the transformer block.
+- `attn-only`
+  - only attention layer will be trained, lot of papers only do training on attn layer.
+- `unet-transformer-only`
+  - as same as kohya_ss/sd_scripts with disabled TE, or, attn-mlp preset with train_unet_only enabled.
+- `unet-convblock-only`
+  - only ResBlock, UpSample, DownSample will be trained.
+---
+###### `conv_dim` and `conv_alpha`
+The convolution dimensions are related to the rank of the convolution in the model, adjusting this value can have a [significant impact](https://ashejunius.com/alpha-and-dimensions-two-wild-settings-of-training-lora-in-stable-diffusion-d7ad3e3a3b0a) and lowering it affected the aesthetic differences between different LoRA samples. and an alpha value of `128` was used for training a specific character's face while Kohaku recommended to set this to `1` for both LoCon and LoHa.
+```python
+conv_block_dims = [conv_dim] * num_total_blocks
+conv_block_alphas = [conv_alpha] * num_total_blocks
+```
+---
+###### `module_dropout` and `dropout` and `rank_dropout`
+`rank_dropout` is a form of dropout, which is a regularization technique used in neural networks to prevent overfitting and improve generalization. However, unlike traditional dropout which randomly sets a proportion of inputs to zero, `rank_dropout` operates on the rank of the input tensor `lx`. First a binary mask is created with the same rank as `lx` with each element set to `True` with probability `1 - rank_dropout` and `False` otherwise. Then the `mask` is applied to `lx` to randomly set some of its elements to zero. After applying the dropout, a scaling factor is applied to `lx` to compensate for the dropped out elements. This is done to ensure that the expected sum of `lx` remains the same before and after dropout. The scaling factor is `1.0 / (1.0 - self.rank_dropout)`.
+It’s called “rank” dropout because it operates on the rank of the input tensor, rather than its individual elements. This can be particularly useful in tasks where the rank of the input is important.
+If `rank_dropout` is set to `0`, it means that no dropout is applied to the rank of the input tensor `lx`. All elements of the mask would be set to `True` and when the mask gets applied to `lx` all of it's elements would be retained and when the scaling factor is applied after dropout it's value would just equal `self.scale` because `1.0 / (1.0 - 0)` is `1`. Basically, setting this to `0` effectively disables the dropout mechanism but it will still do some meaningless calculations.
+```python
+def forward(self, x):
+    org_forwarded = self.org_forward(x)
+    # module dropout
+    if self.module_dropout is not None and self.training:
+        if torch.rand(1) < self.module_dropout:
+            return org_forwarded
+    lx = self.lora_down(x)
+    # normal dropout
+    if self.dropout is not None and self.training:
+        lx = torch.nn.functional.dropout(lx, p=self.dropout)
+    # rank dropout
+    if self.rank_dropout is not None and self.training:
+        mask = torch.rand((lx.size(0), self.lora_dim), device=lx.device) > self.rank_dropout
+        if len(lx.size()) == 3:
+            mask = mask.unsqueeze(1)
+        elif len(lx.size()) == 4:
+            mask = mask.unsqueeze(-1).unsqueeze(-1)
+        lx = lx * mask
+        scale = self.scale * (1.0 / (1.0 - self.rank_dropout))
+    else:
+        scale = self.scale
+    lx = self.lora_up(lx)
+    return org_forwarded + lx * self.multiplier * scale
+```
+---
+###### `use_tucker`
+Can be used for all but `(IA)^3` and native fine-tuning.
+Tucker decomposition is a method in mathematics that decomposes a tensor into a set of matrices and one small core tensor reducing the computational complexity and memory requirements of the model. It is used in various LyCORIS modules on various blocks. In LoCon for example, if `use_tucker` is `True` and the kernel size `k_size` is not `(1, 1)`, then the convolution operation is decomposed into three separate operations.
+1. A 1x1 convolution that reduces the number of channels from `in_dim` to `lora_dim`.
+2. A convolution with the original kernel size `k_size`, stride `stride`, and padding `padding`, but with a reduced number of channels `lora_dim`.
+3. A 1x1 convolution that increases the number of channels back from `lora_dim` to `out_dim`.
+If `use_tucker` is `False` or not set, or if the kernel size k_size is `(1, 1)`, then a standard convolution operation is performed with the original kernel size, stride, and padding, and the number of channels is reduced from `in_dim` to `lora_dim`.
+---
+###### `use_scalar`
+An additional learned parameter that scales the contribution of the low-rank weights before they are added to the original weights. This scalar can control the extent to which the low-rank adaptation modifies the original weights. By training this scalar, the model can learn the optimal balance between preserving the original pre-trained weights and allowing for low-rank adaptation.
+```python
+if use_scalar:
+    self.scalar = nn.Parameter(torch.tensor(0.0))
+else:
+    self.scalar = torch.tensor(1.0)
+```
+---
+###### `rank_dropout_scale`
+A boolean flag that determines whether to scale the dropout mask to have an average value of `1` or not. This can be useful in certain situations to maintain the scale of the tensor after dropout is applied.
+```python
+def forward(self, orig_weight, org_bias, new_weight, new_bias, *args, **kwargs):
+    device = self.oft_blocks.device
+    if self.rank_dropout and self.training:
+        drop = (torch.rand(self.oft_blocks, device=device) < self.rank_dropout).to(
+            self.oft_blocks.dtype
+        )
+        if self.rank_dropout_scale:
+            drop /= drop.mean()
+    else:
+        drop = 1
+```
+---
+###### `algo`
+The LyCORIS algorithm used, you can find a [list](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Algo-List.md) of the implemented algorithms and an [explanation](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Algo-Details.md) of them, with a [demo](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Demo.md) you can also dig into the [research paper](https://arxiv.org/pdf/2309.14859.pdf).
+---
+###### `train_norm`
+Controls whether to train normalization layers used by all algorithms except `(IA)^3` or not.
+---
+###### `block_dims`
+Specify the rank of each block, it takes exactly 25 numbers, that is why this line is so long.
+---
+###### `block_alphas`
+Specifies the alpha of each block, this too also takes 25 numbers if you don't specify it `network_alpha` will be used instead for the value.
 ---
 ##### `--network_dropout`
+Using `weight_decompose=True` will ignore `network_dropout` and only rank and module dropout will be applied.
+```python
     --network_dropout=0 \
 ```
 ⚠️
+```python
     --lr_scheduler="cosine" \
 ```
 ##### `--network_dim`
+The Network Rank (Dimension) is responsible for how many features your LoRA will be training. It is in a close relation with Network Alpha and the Unet + TE learning rates and of course the quality of your dataset. Personal experimentation with these values is strongly recommended.
 ```py
     --network_dim=8 \
 ##### `--output_name`
+Specify the output name excluding the file extension.
+**WARNING**: If for some reason this is ever left empty your last epoch won't be saved!
 ```py
     --output_name="last" \
 [![An AI generated image.](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)
+Encourages the LoRA to diversify it's training by randomly removing some weights to prevent overfitting, in the real world this is called Max-norm regularization.
+The network you are training needs to support it though! See [PR#545](https://github.com/kohya-ss/sd-scripts/pull/545) for more details.
 ```py
     --scale_weight_norms=1 \