awoo
Browse filesSigned-off-by: Balazs Horvath <[email protected]>
README.md
CHANGED
@@ -79,6 +79,7 @@ The Yiff Toolkit is a comprehensive set of tools designed to enhance your creati
|
|
79 |
- [`--shuffle_caption`](#--shuffle_caption)
|
80 |
- [`--sdpa` or `--xformers` or `--mem_eff_attn`](#--sdpa-or---xformers-or---mem_eff_attn)
|
81 |
- [`--multires_noise_iterations` and `--multires_noise_discount`](#--multires_noise_iterations-and---multires_noise_discount)
|
|
|
82 |
- [`--sample_prompts` and `--sample_sampler` and `--sample_every_n_steps`](#--sample_prompts-and---sample_sampler-and---sample_every_n_steps)
|
83 |
- [Embeddings for 1.5 and SDXL](#embeddings-for-15-and-sdxl)
|
84 |
- [ComfyUI Walkthrough any%](#comfyui-walkthrough-any)
|
@@ -745,28 +746,44 @@ If `use_tucker` is `False` or not set, or if the kernel size k_size is `(1, 1)`,
|
|
745 |
An additional learned parameter that scales the contribution of the low-rank weights before they are added to the original weights. This scalar can control the extent to which the low-rank adaptation modifies the original weights. By training this scalar, the model can learn the optimal balance between preserving the original pre-trained weights and allowing for low-rank adaptation.
|
746 |
|
747 |
```python
|
|
|
748 |
if use_scalar:
|
|
|
|
|
749 |
self.scalar = nn.Parameter(torch.tensor(0.0))
|
750 |
else:
|
|
|
|
|
751 |
self.scalar = torch.tensor(1.0)
|
752 |
```
|
753 |
|
|
|
|
|
754 |
---
|
755 |
|
756 |
###### `rank_dropout_scale`
|
757 |
|
758 |
-
A boolean flag that determines whether to scale the dropout mask to have an average value of `1` or not. This
|
759 |
|
760 |
```python
|
761 |
def forward(self, orig_weight, org_bias, new_weight, new_bias, *args, **kwargs):
|
|
|
762 |
device = self.oft_blocks.device
|
|
|
|
|
763 |
if self.rank_dropout and self.training:
|
|
|
|
|
764 |
drop = (torch.rand(self.oft_blocks, device=device) < self.rank_dropout).to(
|
765 |
self.oft_blocks.dtype
|
766 |
)
|
|
|
|
|
|
|
767 |
if self.rank_dropout_scale:
|
768 |
drop /= drop.mean()
|
769 |
else:
|
|
|
770 |
drop = 1
|
771 |
```
|
772 |
|
@@ -999,6 +1016,10 @@ Each of these options modifies the attention mechanism used in the model, which
|
|
999 |
- `--mem_eff_attn`: This flag enables the use of memory-efficient attention mechanisms in the model. The memory-efficient attention is designed to reduce the memory footprint during the training of transformer models, which can be particularly beneficial when working with large models or datasets.
|
1000 |
- `--sdpa`: This option enables the use of Scaled Dot-Product Attention (SDPA) within the model. SDPA is a fundamental component of transformer models that calculates the attention scores between queries and keys. It scales the dot products by the dimensionality of the keys to stabilize gradients during training. This mechanism is particularly useful for handling long sequences and can potentially improve the model’s ability to capture long-range dependencies.
|
1001 |
|
|
|
|
|
|
|
|
|
1002 |
---
|
1003 |
|
1004 |
##### `--multires_noise_iterations` and `--multires_noise_discount`
|
@@ -1013,6 +1034,14 @@ The `--multires_noise_discount` parameter controls the extent to which the noise
|
|
1013 |
|
1014 |
Please note that `--multires_noise_discount` has no effect without `--multires_noise_iterations`.
|
1015 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1016 |
```python
|
1017 |
--multires_noise_iterations=10 --multires_noise_discount=0.1
|
1018 |
```
|
|
|
79 |
- [`--shuffle_caption`](#--shuffle_caption)
|
80 |
- [`--sdpa` or `--xformers` or `--mem_eff_attn`](#--sdpa-or---xformers-or---mem_eff_attn)
|
81 |
- [`--multires_noise_iterations` and `--multires_noise_discount`](#--multires_noise_iterations-and---multires_noise_discount)
|
82 |
+
- [Implementation Details](#implementation-details)
|
83 |
- [`--sample_prompts` and `--sample_sampler` and `--sample_every_n_steps`](#--sample_prompts-and---sample_sampler-and---sample_every_n_steps)
|
84 |
- [Embeddings for 1.5 and SDXL](#embeddings-for-15-and-sdxl)
|
85 |
- [ComfyUI Walkthrough any%](#comfyui-walkthrough-any)
|
|
|
746 |
An additional learned parameter that scales the contribution of the low-rank weights before they are added to the original weights. This scalar can control the extent to which the low-rank adaptation modifies the original weights. By training this scalar, the model can learn the optimal balance between preserving the original pre-trained weights and allowing for low-rank adaptation.
|
747 |
|
748 |
```python
|
749 |
+
# Check if the 'use_scalar' flag is set to True
|
750 |
if use_scalar:
|
751 |
+
# If True, initialize a learnable parameter 'scalar' with a starting value of 0.0.
|
752 |
+
# This parameter will be optimized during the training process.
|
753 |
self.scalar = nn.Parameter(torch.tensor(0.0))
|
754 |
else:
|
755 |
+
# If the 'use_scalar' flag is False, set 'scalar' to a fixed value of 1.0.
|
756 |
+
# This means the low-rank weights will be added to the original weights without scaling.
|
757 |
self.scalar = torch.tensor(1.0)
|
758 |
```
|
759 |
|
760 |
+
The `use_scalar` flag allows the model to determine how much influence the low-rank weights should have on the final weights. If `use_scalar` is `True`, the model can learn the optimal value for `self.scalar` during training, which multiplies the low-rank weights before they are added to the original weights. This provides a way to balance between the original pre-trained weights and the new low-rank adaptations, potentially leading to better performance and more efficient training. The initial value of `0.0` for `self.scalar` suggests that the model starts with no contribution from the low-rank weights and learns the appropriate scale during training.
|
761 |
+
|
762 |
---
|
763 |
|
764 |
###### `rank_dropout_scale`
|
765 |
|
766 |
+
A boolean flag that determines whether to scale the dropout mask to have an average value of `1` or not. This is particularly useful when you want to maintain the original scale of the tensor values after applying dropout, which can be important for the stability of the training process.
|
767 |
|
768 |
```python
|
769 |
def forward(self, orig_weight, org_bias, new_weight, new_bias, *args, **kwargs):
|
770 |
+
# Retrieve the device that the 'oft_blocks' tensor is on. This ensures that any new tensors created are on the same device.
|
771 |
device = self.oft_blocks.device
|
772 |
+
|
773 |
+
# Check if rank dropout is enabled and the model is in training mode.
|
774 |
if self.rank_dropout and self.training:
|
775 |
+
# Create a random tensor the same shape as 'oft_blocks', with values drawn from a uniform distribution.
|
776 |
+
# Then create a dropout mask by checking if each value is less than 'self.rank_dropout' probability.
|
777 |
drop = (torch.rand(self.oft_blocks, device=device) < self.rank_dropout).to(
|
778 |
self.oft_blocks.dtype
|
779 |
)
|
780 |
+
|
781 |
+
# If 'rank_dropout_scale' is True, scale the dropout mask to have an average value of 1.
|
782 |
+
# This helps maintain the scale of the tensor's values after dropout is applied.
|
783 |
if self.rank_dropout_scale:
|
784 |
drop /= drop.mean()
|
785 |
else:
|
786 |
+
# If rank dropout is not enabled or the model is not in training mode, set 'drop' to 1 (no dropout).
|
787 |
drop = 1
|
788 |
```
|
789 |
|
|
|
1016 |
- `--mem_eff_attn`: This flag enables the use of memory-efficient attention mechanisms in the model. The memory-efficient attention is designed to reduce the memory footprint during the training of transformer models, which can be particularly beneficial when working with large models or datasets.
|
1017 |
- `--sdpa`: This option enables the use of Scaled Dot-Product Attention (SDPA) within the model. SDPA is a fundamental component of transformer models that calculates the attention scores between queries and keys. It scales the dot products by the dimensionality of the keys to stabilize gradients during training. This mechanism is particularly useful for handling long sequences and can potentially improve the model’s ability to capture long-range dependencies.
|
1018 |
|
1019 |
+
```python
|
1020 |
+
--sdpa
|
1021 |
+
```
|
1022 |
+
|
1023 |
---
|
1024 |
|
1025 |
##### `--multires_noise_iterations` and `--multires_noise_discount`
|
|
|
1034 |
|
1035 |
Please note that `--multires_noise_discount` has no effect without `--multires_noise_iterations`.
|
1036 |
|
1037 |
+
###### Implementation Details
|
1038 |
+
|
1039 |
+
The `get_noise_noisy_latents_and_timesteps` function samples noise that will be added to the latents. If `args.noise_offset` is true, it applies a noise offset. If `args.multires_noise_iterations` is true, it applies multi-resolution noise to the sampled noise.
|
1040 |
+
|
1041 |
+
The function then samples a random timestep for each image and adds noise to the latents according to the noise magnitude at each timestep. This is the forward diffusion process.
|
1042 |
+
|
1043 |
+
The `pyramid_noise_like` function generates noise with a pyramid structure. It starts with the original noise and adds upscaled noise at decreasing resolutions. The noise at each level is scaled by a discount factor raised to the power of the level. The noise is then scaled back to roughly unit variance. This function is used to implement the multi-resolution noise.
|
1044 |
+
|
1045 |
```python
|
1046 |
--multires_noise_iterations=10 --multires_noise_discount=0.1
|
1047 |
```
|