awoo

Browse files

Signed-off-by: Balazs Horvath <[email protected]>

Files changed (2) hide show

README.md +17 -3
dataset_tools/e621 JSON to txt.ipynb +0 -0

README.md CHANGED Viewed

@@ -931,7 +931,9 @@ Learning will always end with what you specify in `--max_train_epochs` or `--max
 ##### `--caption_extension`
-⚠️
 ```py
     --caption_extension=".txt"
@@ -991,13 +993,25 @@ NOTE: `--cache_text_encoder_outputs` and `--cache_text_encoder_outputs_to_disk`
 ##### `--sdpa` or `--xformers` or `--mem_eff_attn`
-The choice between `--xformers` or `--mem_eff_attn` and `--spda` will depend on your GPU. You can benchmark it by repeating a training with them!
 ---
 ##### `--multires_noise_iterations` and `--multires_noise_discount`
-⚠️
 ```python
     --multires_noise_iterations=10 --multires_noise_discount=0.1

 ##### `--caption_extension`
+The file extension for caption files. Default is `.caption`. These caption files contain text descriptions that are associated with the training images. When you run the training script, it will look for files with this specified extension in the training data folder. The script uses the content of these files as captions to provide context for the images during the training process.
+For example, if your images are named `image1.jpg`, `image2.jpg`, and so on, and you use the default .caption extension, the script will expect the caption files to be named `image1.caption`, `image2.caption`, etc. If you want to use a different extension, like `.txt`, you would set the caption_extension parameter to `.txt`, and the script would then look for `image1.txt`, `image2.txt`, and so on.
 ```py
     --caption_extension=".txt"
 ##### `--sdpa` or `--xformers` or `--mem_eff_attn`
+Each of these options modifies the attention mechanism used in the model, which can have a significant impact on the model's performance and memory usage. The choice between `--xformers` or `--mem_eff_attn` and `--spda` will depend on your GPU. You can benchmark it by repeating a training with them!
+- `--xformers`: This flag enables the use of XFormers in the model. XFormers is a library developed by Facebook Research that provides a collection of transformer models optimized for different hardware and use-cases. These models are designed to be highly efficient, flexible, and customizable. They offer various types of attention mechanisms and other features that can be beneficial in scenarios where you have limited GPU memory or need to handle large-scale data.
+- `--mem_eff_attn`: This flag enables the use of memory-efficient attention mechanisms in the model. The memory-efficient attention is designed to reduce the memory footprint during the training of transformer models, which can be particularly beneficial when working with large models or datasets.
+- `--sdpa`: This option enables the use of Scaled Dot-Product Attention (SDPA) within the model. SDPA is a fundamental component of transformer models that calculates the attention scores between queries and keys. It scales the dot products by the dimensionality of the keys to stabilize gradients during training. This mechanism is particularly useful for handling long sequences and can potentially improve the model’s ability to capture long-range dependencies.
 ---
 ##### `--multires_noise_iterations` and `--multires_noise_discount`
+Multi-resolution noise is a new approach that adds noise at multiple resolutions to an image or latent image during the training of diffusion models. A model trained with this technique can generate visually striking images with a distinct aesthetic compared to the usual outputs of diffusion models.
+A model trained with multi-resolution noise can generate a more diverse range of images than regular stable diffusion, including extremely light or dark images. These have historically been challenging to achieve without resorting to using a large number of sampling steps.
+This technique is particularly beneficial when working with small datasets but you I don't think you should ever not use it.
+The `--multires_noise_discount` parameter controls the extent to which the noise amount at each resolution is weakened. A value of 0.1 is recommended. The `--multires_noise_iterations` parameter determines the number of iterations for adding multi-resolution noise, with a recommended range of 6 to 10.
+Please note that `--multires_noise_discount` has no effect without `--multires_noise_iterations`.
 ```python
     --multires_noise_iterations=10 --multires_noise_discount=0.1

dataset_tools/e621 JSON to txt.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff