k4d3 commited on
Commit
bff67d4
1 Parent(s): d94919e

Signed-off-by: Balazs Horvath <[email protected]>

Files changed (1) hide show
  1. README.md +207 -25
README.md CHANGED
@@ -37,6 +37,7 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
37
  - [Download Pony in Diffusers Format](#download-pony-in-diffusers-format)
38
  - [Sample Prompt File](#sample-prompt-file)
39
  - [Training Commands](#training-commands)
 
40
  - [`--lowram`](#--lowram)
41
  - [`--pretrained_model_name_or_path`](#--pretrained_model_name_or_path)
42
  - [`--output_dir`](#--output_dir)
@@ -48,6 +49,17 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
48
  - [`--save_model_as`](#--save_model_as)
49
  - [`--network_module`](#--network_module)
50
  - [`--network_args`](#--network_args)
 
 
 
 
 
 
 
 
 
 
 
51
  - [`--network_dropout`](#--network_dropout)
52
  - [`--lr_scheduler`](#--lr_scheduler)
53
  - [`--lr_scheduler_num_cycles`](#--lr_scheduler_num_cycles)
@@ -473,6 +485,20 @@ If you are training with multiple GPUs, ensure that the total number of prompts
473
  <details>
474
  <summary>Click to reveal training commands.</summary>
475
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
476
  ##### `--lowram`
477
 
478
  If you are running running out of system memory like I do with 2 GPUs and a really fat model that gets loaded into it per GPU, this option will help you save a bit of it and might get you out of OOM hell.
@@ -481,9 +507,9 @@ If you are running running out of system memory like I do with 2 GPUs and a real
481
 
482
  ##### `--pretrained_model_name_or_path`
483
 
484
- The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local model with a `/`.
485
 
486
- ```py
487
  --pretrained_model_name_or_path="/ponydiffusers/" \
488
  ```
489
 
@@ -493,7 +519,7 @@ The directory containing the checkpoint you just downloaded. I recommend closing
493
 
494
  This is where all the saved epochs or steps will be saved, including the last one. If y
495
 
496
- ```py
497
  --output_dir="/output_dir" \
498
  ```
499
 
@@ -503,7 +529,7 @@ This is where all the saved epochs or steps will be saved, including the last on
503
 
504
  The directory containing the dataset. We prepared this earlier together.
505
 
506
- ```py
507
  --train_data_dir="/training_dir" \
508
  ```
509
 
@@ -513,7 +539,7 @@ The directory containing the dataset. We prepared this earlier together.
513
 
514
  Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to `512,512` as a last resort.
515
 
516
- ```py
517
  --resolution="1024,1024" \
518
  ```
519
 
@@ -521,9 +547,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
521
 
522
  ##### `--enable_bucket`
523
 
524
- ⚠️
525
 
526
- ```py
527
  --enable_bucket \
528
  ```
529
 
@@ -531,9 +557,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
531
 
532
  ##### `--min_bucket_reso` and `--max_bucket_reso`
533
 
534
- ⚠️
535
 
536
- ```py
537
  --min_bucket_reso=256 \
538
  --max_bucket_reso=1024 \
539
  ```
@@ -542,9 +568,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
542
 
543
  ##### `--network_alpha`
544
 
545
- ⚠️
546
 
547
- ```py
548
  --network_alpha=4 \
549
  ```
550
 
@@ -552,9 +578,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
552
 
553
  ##### `--save_model_as`
554
 
555
- ⚠️
556
 
557
- ```py
558
  --save_model_as=safetensors \
559
  ```
560
 
@@ -562,9 +588,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
562
 
563
  ##### `--network_module`
564
 
565
- ⚠️
566
 
567
- ```py
568
  --network_module=lycoris.kohya \
569
  ```
570
 
@@ -572,16 +598,17 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
572
 
573
  ##### `--network_args`
574
 
575
- ⚠️
576
 
577
- ```py
578
  --network_args \
579
  "use_reentrant=False" \
580
  "preset=full" \
581
  "conv_dim=256" \
582
  "conv_alpha=4" \
583
- "rank_dropout=0" \
584
- "module_dropout=0" \
 
585
  "use_tucker=False" \
586
  "use_scalar=False" \
587
  "rank_dropout_scale=False" \
@@ -591,13 +618,164 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
591
  "block_alphas=0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625" \
592
  ```
593
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
594
  ---
595
 
596
  ##### `--network_dropout`
597
 
598
- ⚠️
599
 
600
- ```py
601
  --network_dropout=0 \
602
  ```
603
 
@@ -607,7 +785,7 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
607
 
608
  ⚠️
609
 
610
- ```py
611
  --lr_scheduler="cosine" \
612
  ```
613
 
@@ -655,7 +833,7 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
655
 
656
  ##### `--network_dim`
657
 
658
- ⚠️
659
 
660
  ```py
661
  --network_dim=8 \
@@ -665,7 +843,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
665
 
666
  ##### `--output_name`
667
 
668
- ⚠️
 
 
669
 
670
  ```py
671
  --output_name="last" \
@@ -677,7 +857,9 @@ Always set this to match the model's resolution, which in Pony's case it is 1024
677
 
678
  [![An AI generated image.](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)
679
 
680
- Encourages the LoRA to diversify it's training by randomly removing some weights, the network you are training needs to support it though! See [PR#545](https://github.com/kohya-ss/sd-scripts/pull/545) for more details.
 
 
681
 
682
  ```py
683
  --scale_weight_norms=1 \
 
37
  - [Download Pony in Diffusers Format](#download-pony-in-diffusers-format)
38
  - [Sample Prompt File](#sample-prompt-file)
39
  - [Training Commands](#training-commands)
40
+ - [`accelerate launch`](#accelerate-launch)
41
  - [`--lowram`](#--lowram)
42
  - [`--pretrained_model_name_or_path`](#--pretrained_model_name_or_path)
43
  - [`--output_dir`](#--output_dir)
 
49
  - [`--save_model_as`](#--save_model_as)
50
  - [`--network_module`](#--network_module)
51
  - [`--network_args`](#--network_args)
52
+ - [`use_reentrant`](#use_reentrant)
53
+ - [`preset`](#preset)
54
+ - [`conv_dim` and `conv_alpha`](#conv_dim-and-conv_alpha)
55
+ - [`module_dropout` and `dropout` and `rank_dropout`](#module_dropout-and-dropout-and-rank_dropout)
56
+ - [`use_tucker`](#use_tucker)
57
+ - [`use_scalar`](#use_scalar)
58
+ - [`rank_dropout_scale`](#rank_dropout_scale)
59
+ - [`algo`](#algo)
60
+ - [`train_norm`](#train_norm)
61
+ - [`block_dims`](#block_dims)
62
+ - [`block_alphas`](#block_alphas)
63
  - [`--network_dropout`](#--network_dropout)
64
  - [`--lr_scheduler`](#--lr_scheduler)
65
  - [`--lr_scheduler_num_cycles`](#--lr_scheduler_num_cycles)
 
485
  <details>
486
  <summary>Click to reveal training commands.</summary>
487
 
488
+ ##### `accelerate launch`
489
+
490
+ For two GPUs:
491
+
492
+ ```python
493
+ accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 --num_cpu_threads_per_process=2 "./sdxl_train_network.py" \
494
+ ```
495
+
496
+ Single GPU:
497
+
498
+ ```python
499
+ accelerate launch --num_processes=1 --num_machines=1 --gpu_ids=0 --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
500
+ ```
501
+
502
  ##### `--lowram`
503
 
504
  If you are running running out of system memory like I do with 2 GPUs and a really fat model that gets loaded into it per GPU, this option will help you save a bit of it and might get you out of OOM hell.
 
507
 
508
  ##### `--pretrained_model_name_or_path`
509
 
510
+ The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local diffusers model with a `/`. You can also specify a `.safetensors` or `.ckpt` if that is what you have!
511
 
512
+ ```python
513
  --pretrained_model_name_or_path="/ponydiffusers/" \
514
  ```
515
 
 
519
 
520
  This is where all the saved epochs or steps will be saved, including the last one. If y
521
 
522
+ ```python
523
  --output_dir="/output_dir" \
524
  ```
525
 
 
529
 
530
  The directory containing the dataset. We prepared this earlier together.
531
 
532
+ ```python
533
  --train_data_dir="/training_dir" \
534
  ```
535
 
 
539
 
540
  Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to `512,512` as a last resort.
541
 
542
+ ```python
543
  --resolution="1024,1024" \
544
  ```
545
 
 
547
 
548
  ##### `--enable_bucket`
549
 
550
+ Creates different buckets by pre-categorizing images with different aspect ratios into different buckets. This technique helps to avoid issues like unnatural crops that are common when models are trained to produce square images. This allows the creation of batches where every item has the same size, but the image size of batches may differ.
551
 
552
+ ```python
553
  --enable_bucket \
554
  ```
555
 
 
557
 
558
  ##### `--min_bucket_reso` and `--max_bucket_reso`
559
 
560
+ Specifies the minimum and maximum resolutions used by the buckets. These values are ignored if `--bucket_no_upscale` is set.
561
 
562
+ ```python
563
  --min_bucket_reso=256 \
564
  --max_bucket_reso=1024 \
565
  ```
 
568
 
569
  ##### `--network_alpha`
570
 
571
+ Specifies how many of the trained Network Ranks are allowed to alter the base model.
572
 
573
+ ```python
574
  --network_alpha=4 \
575
  ```
576
 
 
578
 
579
  ##### `--save_model_as`
580
 
581
+ You can use this to specify either `ckpt` or `safetensors` for the file format.
582
 
583
+ ```python
584
  --save_model_as=safetensors \
585
  ```
586
 
 
588
 
589
  ##### `--network_module`
590
 
591
+ Specifies which network module you are going to train.
592
 
593
+ ```python
594
  --network_module=lycoris.kohya \
595
  ```
596
 
 
598
 
599
  ##### `--network_args`
600
 
601
+ The arguments passed down to the network.
602
 
603
+ ```python
604
  --network_args \
605
  "use_reentrant=False" \
606
  "preset=full" \
607
  "conv_dim=256" \
608
  "conv_alpha=4" \
609
+ "dropout=None" \
610
+ "rank_dropout=None" \
611
+ "module_dropout=None" \
612
  "use_tucker=False" \
613
  "use_scalar=False" \
614
  "rank_dropout_scale=False" \
 
618
  "block_alphas=0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625" \
619
  ```
620
 
621
+ **Let's break it down!**
622
+
623
+ ---
624
+
625
+ ###### `use_reentrant`
626
+
627
+ - If `use_reentrant=False` is specified, checkpoint will use an implementation that does not require re-entrant autograd. You can learn more about checkpointing [here](https://pytorch.org/docs/stable/checkpoint.html). Note that future versions of PyTorch will default to `use_reentrant=False`, today the default is still `True`, so we set it to `False`. Easy!
628
+
629
+ ---
630
+
631
+ ###### `preset`
632
+
633
+ The [Preset](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Preset.md)/config system added to LyCORIS for more fine-grained control.
634
+
635
+ - `full`
636
+ - default preset, train all the layers in the UNet and CLIP.
637
+ - `full-lin`
638
+ - `full` but skip convolutional layers.
639
+ - `attn-mlp`
640
+ - "kohya preset", train all the transformer block.
641
+ - `attn-only`
642
+ - only attention layer will be trained, lot of papers only do training on attn layer.
643
+ - `unet-transformer-only`
644
+ - as same as kohya_ss/sd_scripts with disabled TE, or, attn-mlp preset with train_unet_only enabled.
645
+ - `unet-convblock-only`
646
+ - only ResBlock, UpSample, DownSample will be trained.
647
+
648
+ ---
649
+
650
+ ###### `conv_dim` and `conv_alpha`
651
+
652
+ The convolution dimensions are related to the rank of the convolution in the model, adjusting this value can have a [significant impact](https://ashejunius.com/alpha-and-dimensions-two-wild-settings-of-training-lora-in-stable-diffusion-d7ad3e3a3b0a) and lowering it affected the aesthetic differences between different LoRA samples. and an alpha value of `128` was used for training a specific character's face while Kohaku recommended to set this to `1` for both LoCon and LoHa.
653
+
654
+ ```python
655
+ conv_block_dims = [conv_dim] * num_total_blocks
656
+ conv_block_alphas = [conv_alpha] * num_total_blocks
657
+ ```
658
+
659
+ ---
660
+
661
+ ###### `module_dropout` and `dropout` and `rank_dropout`
662
+
663
+ `rank_dropout` is a form of dropout, which is a regularization technique used in neural networks to prevent overfitting and improve generalization. However, unlike traditional dropout which randomly sets a proportion of inputs to zero, `rank_dropout` operates on the rank of the input tensor `lx`. First a binary mask is created with the same rank as `lx` with each element set to `True` with probability `1 - rank_dropout` and `False` otherwise. Then the `mask` is applied to `lx` to randomly set some of its elements to zero. After applying the dropout, a scaling factor is applied to `lx` to compensate for the dropped out elements. This is done to ensure that the expected sum of `lx` remains the same before and after dropout. The scaling factor is `1.0 / (1.0 - self.rank_dropout)`.
664
+
665
+ It’s called “rank” dropout because it operates on the rank of the input tensor, rather than its individual elements. This can be particularly useful in tasks where the rank of the input is important.
666
+
667
+ If `rank_dropout` is set to `0`, it means that no dropout is applied to the rank of the input tensor `lx`. All elements of the mask would be set to `True` and when the mask gets applied to `lx` all of it's elements would be retained and when the scaling factor is applied after dropout it's value would just equal `self.scale` because `1.0 / (1.0 - 0)` is `1`. Basically, setting this to `0` effectively disables the dropout mechanism but it will still do some meaningless calculations.
668
+
669
+ ```python
670
+ def forward(self, x):
671
+ org_forwarded = self.org_forward(x)
672
+
673
+ # module dropout
674
+ if self.module_dropout is not None and self.training:
675
+ if torch.rand(1) < self.module_dropout:
676
+ return org_forwarded
677
+
678
+ lx = self.lora_down(x)
679
+
680
+ # normal dropout
681
+ if self.dropout is not None and self.training:
682
+ lx = torch.nn.functional.dropout(lx, p=self.dropout)
683
+
684
+ # rank dropout
685
+ if self.rank_dropout is not None and self.training:
686
+ mask = torch.rand((lx.size(0), self.lora_dim), device=lx.device) > self.rank_dropout
687
+ if len(lx.size()) == 3:
688
+ mask = mask.unsqueeze(1)
689
+ elif len(lx.size()) == 4:
690
+ mask = mask.unsqueeze(-1).unsqueeze(-1)
691
+ lx = lx * mask
692
+
693
+ scale = self.scale * (1.0 / (1.0 - self.rank_dropout))
694
+ else:
695
+ scale = self.scale
696
+
697
+ lx = self.lora_up(lx)
698
+
699
+ return org_forwarded + lx * self.multiplier * scale
700
+ ```
701
+
702
+ ---
703
+
704
+ ###### `use_tucker`
705
+
706
+ Can be used for all but `(IA)^3` and native fine-tuning.
707
+
708
+ Tucker decomposition is a method in mathematics that decomposes a tensor into a set of matrices and one small core tensor reducing the computational complexity and memory requirements of the model. It is used in various LyCORIS modules on various blocks. In LoCon for example, if `use_tucker` is `True` and the kernel size `k_size` is not `(1, 1)`, then the convolution operation is decomposed into three separate operations.
709
+
710
+ 1. A 1x1 convolution that reduces the number of channels from `in_dim` to `lora_dim`.
711
+ 2. A convolution with the original kernel size `k_size`, stride `stride`, and padding `padding`, but with a reduced number of channels `lora_dim`.
712
+ 3. A 1x1 convolution that increases the number of channels back from `lora_dim` to `out_dim`.
713
+
714
+ If `use_tucker` is `False` or not set, or if the kernel size k_size is `(1, 1)`, then a standard convolution operation is performed with the original kernel size, stride, and padding, and the number of channels is reduced from `in_dim` to `lora_dim`.
715
+
716
+ ---
717
+
718
+ ###### `use_scalar`
719
+
720
+ An additional learned parameter that scales the contribution of the low-rank weights before they are added to the original weights. This scalar can control the extent to which the low-rank adaptation modifies the original weights. By training this scalar, the model can learn the optimal balance between preserving the original pre-trained weights and allowing for low-rank adaptation.
721
+
722
+ ```python
723
+ if use_scalar:
724
+ self.scalar = nn.Parameter(torch.tensor(0.0))
725
+ else:
726
+ self.scalar = torch.tensor(1.0)
727
+ ```
728
+
729
+ ---
730
+
731
+ ###### `rank_dropout_scale`
732
+
733
+ A boolean flag that determines whether to scale the dropout mask to have an average value of `1` or not. This can be useful in certain situations to maintain the scale of the tensor after dropout is applied.
734
+
735
+ ```python
736
+ def forward(self, orig_weight, org_bias, new_weight, new_bias, *args, **kwargs):
737
+ device = self.oft_blocks.device
738
+ if self.rank_dropout and self.training:
739
+ drop = (torch.rand(self.oft_blocks, device=device) < self.rank_dropout).to(
740
+ self.oft_blocks.dtype
741
+ )
742
+ if self.rank_dropout_scale:
743
+ drop /= drop.mean()
744
+ else:
745
+ drop = 1
746
+ ```
747
+
748
+ ---
749
+
750
+ ###### `algo`
751
+
752
+ The LyCORIS algorithm used, you can find a [list](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Algo-List.md) of the implemented algorithms and an [explanation](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Algo-Details.md) of them, with a [demo](https://github.com/KohakuBlueleaf/LyCORIS/blob/HEAD/docs/Demo.md) you can also dig into the [research paper](https://arxiv.org/pdf/2309.14859.pdf).
753
+
754
+ ---
755
+
756
+ ###### `train_norm`
757
+
758
+ Controls whether to train normalization layers used by all algorithms except `(IA)^3` or not.
759
+
760
+ ---
761
+
762
+ ###### `block_dims`
763
+
764
+ Specify the rank of each block, it takes exactly 25 numbers, that is why this line is so long.
765
+
766
+ ---
767
+
768
+ ###### `block_alphas`
769
+
770
+ Specifies the alpha of each block, this too also takes 25 numbers if you don't specify it `network_alpha` will be used instead for the value.
771
+
772
  ---
773
 
774
  ##### `--network_dropout`
775
 
776
+ Using `weight_decompose=True` will ignore `network_dropout` and only rank and module dropout will be applied.
777
 
778
+ ```python
779
  --network_dropout=0 \
780
  ```
781
 
 
785
 
786
  ⚠️
787
 
788
+ ```python
789
  --lr_scheduler="cosine" \
790
  ```
791
 
 
833
 
834
  ##### `--network_dim`
835
 
836
+ The Network Rank (Dimension) is responsible for how many features your LoRA will be training. It is in a close relation with Network Alpha and the Unet + TE learning rates and of course the quality of your dataset. Personal experimentation with these values is strongly recommended.
837
 
838
  ```py
839
  --network_dim=8 \
 
843
 
844
  ##### `--output_name`
845
 
846
+ Specify the output name excluding the file extension.
847
+
848
+ **WARNING**: If for some reason this is ever left empty your last epoch won't be saved!
849
 
850
  ```py
851
  --output_name="last" \
 
857
 
858
  [![An AI generated image.](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)](https://huggingface.co/k4d3/yiff_toolkit/resolve/main/static/tutorial/dropout1.png)
859
 
860
+ Encourages the LoRA to diversify it's training by randomly removing some weights to prevent overfitting, in the real world this is called Max-norm regularization.
861
+
862
+ The network you are training needs to support it though! See [PR#545](https://github.com/kohya-ss/sd-scripts/pull/545) for more details.
863
 
864
  ```py
865
  --scale_weight_norms=1 \