license: wtfpl
datasets:
- k4d3/furry
language:
- en
tags:
- not-for-all-audiences
Hotdogwolf's Yiff Toolkit
The Yiff Toolkit is a comprehensive set of tools designed to enhance your creative process in the realm of furry art. From refining artist styles to generating unique characters, the Yiff Toolkit provides a range of tools to help you cum.
NOTE: You can click on any image in this README to be instantly teleported next to the original resolution version of it! If you want the metadata for a picture and it isn't there, you need to delete the letter e before the .png in the link! If a metadata containing original image is missing, please let me know!
Table of Contents
Click to reveal table of contents
- Hotdogwolf's Yiff Toolkit
- Table of Contents
- Dataset Tools
- Dataset Preparation
- Auto Taggers
- LoRA Training Guide
- Installation Tips
- Pony Training
- Download Pony in Diffusers Format
- Sample Prompt File
- Training Commands
accelerate launch
--lowram
--pretrained_model_name_or_path
--output_dir
--train_data_dir
--resolution
--enable_bucket
--min_bucket_reso
and--max_bucket_reso
--network_alpha
--save_model_as
--network_module
--network_args
--network_dropout
--lr_scheduler
--lr_scheduler_num_cycles
--learning_rate
and--unet_lr
and--text_encoder_lr
--network_dim
--output_name
--scale_weight_norms
--max_grad_norm
--no_half_vae
--save_every_n_epochs
and--save_last_n_epochs
or--save_every_n_steps
and--save_last_n_steps
--mixed_precision
--save_precision
--caption_extension
--cache_latents
and--cache_latents_to_disk
--optimizer_type
--dataset_repeats
--max_train_steps
--shuffle_caption
--sdpa
or--xformers
or--mem_eff_attn
--multires_noise_iterations
and--multires_noise_discount
--sample_prompts
and--sample_sampler
and--sample_every_n_steps
- Embeddings for 1.5 and SDXL
- ComfyUI Walkthrough any%
- AnimateDiff for Masochists
- Stable Cascade Furry Bible
- SDXL Furry Bible
- Pony Diffusion V6 LoRAs
- Concept Loras
- Artist/Style LoRAs
- blp-v1e400
- butterchalk-v3e400
- cecily_lin-v1e37
- chunie-v1e5
- cooliehigh-v1e45
- dagasi-v1e134
- darkgem-v1e4
- himari-v1e400
- furry_sticker-v1e250
- goronic-v1e1
- greg_rutkowski-v1e400
- hamgas-v1e400
- honovy-v1e4
- jinxit-v1e10
- kame_3-v1e80
- kenket-v1e4
- louart-v1e10
- realistic-v4e400
- skecchiart-v1e134
- spectrumshift-v1e400
- squishy-v1e10
- whisperingfornothing-v1e58
- wjs07-v1e200
- wolfy-nail-v1e400
- woolrool-v1e4
- Character LoRAs
- Satisfied Customers
Dataset Tools
I have uploaded all of the little handy Python scripts I use to /dataset_tools. They are pretty self explanatory by just the file name but almost all of them contain an AI generated descriptions. If you want to use them you will need to edit the path to your training_dir
folder, the variable will be called path
or directory
and look something like this:
def main():
path = 'C:\\Users\\kade\\Desktop\\training_dir_staging'
Don't be afraid of editing Python scripts, unlike the real snake, these won't bite!
Dataset Preparation
Before you begin collecting your dataset you will need to decide what you want to teach the model, it can be a character, a style or a new concept.
For now let's imagine you want to teach your model wickerbeasts so you can generate your VRChat avatar every night.
Create the training_dir
Directory
Before starting we need a directory where we'll organize our datasets. Open up a terminal by pressing Win + R
and typing in pwsh
. We will also be using git and huggingface to version control our smut. For brevity I'll refrain from giving you a tutorial on both. Once you have your newly created dataset on HF ready lets clone it. Make sure you change user
in the first line to your HF username!
git clone [email protected]:/datasets/user/training_dir C:\training_dir
cd C:\training_dir
git branch wickerbeast
git checkout wickerbeast
Let's continue with downloading some wickerbeast data but don't close the terminal window just yet, for this we'll make good use of the furry booru e621.net. There are two nice ways to download data from this site with the metadata intact, I'll start with the fastest and then I will explain how you can selectively browse around the site and get the images you like one by one.
Grabber
Grabber makes your life easier when trying to compile datasets quickly from imageboards.
Clicking on the Add
button on the Download tab lets you add a group
which will get downloaded, Tags
will be the where you can type in the search parameters like you would on e621.net, so for example the string wickerbeast solo -comic -meme -animated order:score
will search for solo wickerbeast pictures without including comics, memes, and animated posts in descending order of their scores. For training SDXL LoRAs you usually won't need more than 50 images, but you should set the solo group to 40
and add a new group with -solo
instead of solo
and set the Image Limit
to 10
for it to include some images with other characters in it. This will help the model learn a lot better!
You should also enable Separate log files
for e621, this will download the metadata automatically alongside the pictures.
For Pony I've set up the Text file content like so: rating_%rating%, %all:separator=^, %
for other models you might want to replace rating_%rating%
with just %rating%
.
You should also set the Folder
into which the images will get downloaded. Let's use C:\training_dir\1_wickerbeast
for both groups.
Now you are ready to right-click on each group and download the images.
Manual Method
This method requires a browser extension like ViolentMonkey and the following UserScript:
Click to reveal userscript.
// ==UserScript==
// @name e621 JSON Button
// @namespace https://cringe.live
// @version 1.0
// @description Adds a JSON button next to the download button on e621.net
// @author _ka_de
// @match https://e621.net/*
// @match https://e6ai.net/*
// @grant none
// ==/UserScript==
(function() {
'use strict';
function constructJSONUrl() {
// Get the current URL
var currentUrl = window.location.href;
// Extract the post ID from the URL
var postId = currentUrl.match(/^https?:\/\/(?:e621\.net|e6ai\.net)\/posts\/(\d+)/)[1];
// Check the hostname
var hostname = window.location.hostname;
// Construct the JSON URL based on the hostname
var jsonUrl = 'https://' + hostname + '/posts/' + postId + '.json';
return jsonUrl;
}
function createJSONButton() {
// Create a new button element
var jsonButton = document.createElement('a');
// Set the attributes for the button
jsonButton.setAttribute('class', 'button btn-info');
var jsonUrl = constructJSONUrl();
// Set the JSON URL as the button's href attribute
jsonButton.setAttribute('href', jsonUrl);
// Set the inner HTML for the button
jsonButton.innerHTML = '<i class="fa-solid fa-angle-double-right"></i><span>JSON</span>';
// Find the container where we want to insert the button
var container = document.querySelector('#post-options > li:last-child');
// Check if the #image-extra-controls element exists
if (document.getElementById('image-extra-controls')) {
// Insert the button after the download button
container = document.getElementById('image-download-link');
container.insertBefore(jsonButton, container.children[0].nextSibling);
} else {
// Insert the button after the last li element in #post-options
container.parentNode.insertBefore(jsonButton, container.nextSibling);
}
}
// Run the function to create the JSON button
createJSONButton();
})();
This will put a link to the JSON next to the download button on e621.net and e6ai.net and you can use this Python script to convert them to caption files, it uses the rating_
prefix before safe/questionable/explicit
because.. you've guessed it, Pony! It also lets you ignore the tags you add into ignored_tags
using the r"\btag\b",
syntax, just replace tag
with the tag you want it to skip.
Auto Taggers
eva02-vit-large-448-8046
You want to install the only dependency, besides torch, I mean..
pip install timm
The following inference script for the tagger needs a folder as input, be warned that it also converts WebP images to PNG and you can specify tags to be ignored and some other stuff! I recommend reading through it and changing whatever you need.
Click to reveal inference script
import os
import torch
from torchvision import transforms
from PIL import Image
import json
import re
# Set the threshold for tag selection
THRESHOLD = 0.3
# Define the directory containing the images and the path to the model
image_dir = r"./images"
model_path = r"./model.pth"
# Define the set of ignored tags
ignored_tags = {"grandfathered content"}
# Check if CUDA is available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load the model and set it to evaluation mode
model = torch.load(model_path, map_location=device)
model = model.to(device)
model.eval()
# Define the image transformations
transform = transforms.Compose(
[
# Resize the images to 448x448
transforms.Resize((448, 448)),
# Convert the images to PyTorch tensors
transforms.ToTensor(),
# Normalize the images with the given mean and standard deviation
transforms.Normalize(
mean=[0.48145466, 0.4578275, 0.40821073],
std=[0.26862954, 0.26130258, 0.27577711],
),
]
)
# Load the tags from the JSON file
with open("tags_8041.json", "r", encoding="utf8") as file:
tags = json.load(file)
allowed_tags = sorted(tags)
# Add placeholders and explicitness tags to the list of allowed tags
allowed_tags.insert(0, "placeholder0")
allowed_tags.append("placeholder1")
allowed_tags.append("explicit")
allowed_tags.append("questionable")
allowed_tags.append("safe")
# Define the allowed image extensions
image_exts = [".jpg", ".jpeg", ".png"]
for filename in os.listdir(image_dir):
# Check if the file is a WebP image
if filename.endswith(".webp"):
# Construct the input and output file paths
input_path = os.path.join(image_dir, filename)
output_path = os.path.join(image_dir, os.path.splitext(filename)[0] + ".png")
# Open the WebP image and save it as a PNG
image = Image.open(input_path)
image.save(output_path, "PNG")
print(f"Converted {filename} to {os.path.basename(output_path)}")
# Delete the original WebP image
os.remove(input_path)
print(f"Deleted {filename}")
# Get the list of image files in the directory
image_files = [
file
for file in os.listdir(image_dir)
if os.path.splitext(file)[1].lower() in image_exts
]
for image_filename in image_files:
image_path = os.path.join(image_dir, image_filename)
# Open the image
img = Image.open(image_path)
# If the image has an alpha channel, replace it with black
if img.mode in ("RGBA", "LA") or (img.mode == "P" and "transparency" in img.info):
alpha = Image.new(
"L", img.size, 0
) # Create alpha image with mode 'L' (8-bit grayscale)
alpha = alpha.convert(img.mode) # Convert alpha image to same mode as img
img = Image.alpha_composite(alpha, img)
# Convert the image to RGB
img = img.convert("RGB")
# Apply the transformations and move the tensor to the device
tensor = transform(img).unsqueeze(0).to(device)
# Make a forward pass through the model and get the output
with torch.no_grad():
out = model(tensor)
# Apply the sigmoid function to the output to get probabilities
probabilities = torch.sigmoid(out[0])
# Get the indices of the tags with probabilities above the threshold
indices = torch.where(probabilities > THRESHOLD)[0]
values = probabilities[indices]
# Sort the indices by the corresponding probabilities in descending order
sorted_indices = torch.argsort(values, descending=True)
# Get the tags corresponding to the sorted indices, excluding ignored tags and replacing underscores with spaces
tags_to_write = [
allowed_tags[indices[i]].replace("_", " ")
for i in sorted_indices
if allowed_tags[indices[i]] not in ignored_tags
and allowed_tags[indices[i]] not in ("placeholder0", "placeholder1")
]
# Replace 'safe', 'explicit', and 'questionable' with their 'rating_' counterparts
tags_to_write = [
tag.replace("safe", "rating_safe")
.replace("explicit", "rating_explicit")
.replace("questionable", "rating_questionable")
for tag in tags_to_write
]
# Escape unescaped parentheses in the tags
tags_to_write_escaped = [
re.sub(r"(?<!\\)(\()", r"\\\1", tag) for tag in tags_to_write
]
# Create a text file for each image with the filtered and escaped tags
text_filename = os.path.splitext(image_filename)[0] + ".txt"
text_path = os.path.join(image_dir, text_filename)
with open(text_path, "w", encoding="utf8") as text_file:
text_file.write(", ".join(tags_to_write_escaped))
LoRA Training Guide
Installation Tips
Firstly, download kohya_ss' sd-scripts, you need to set up your environment either like this tells you for Windows, or if you are using Linux or Miniconda on Windows, you are probably smart enough to figure out the installation for it. I recommend always installing the latest PyTorch in the virtual environment you are going to use, which at the time of writing is 2.2.2
. I hope future me has faster PyTorch!
Ok, just in case you aren't smart enough how to install the sd-scripts under Miniconda for Windows I actually "guided" someone recently, just so I can tell you about it:
# Installing sd-scripts
git clone https://github.com/kohya-ss/sd-scripts
cd sd-scripts
# Creating the conda environment and installing requirements
conda create -n sdscripts python=3.10.14
conda activate sdscripts
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
python -m pip install --use-pep517 --upgrade -r requirements.txt
python -m pip install --use-pep517 lycoris_lora
accelerate config
accelerate config
will ask you a bunch of questions, you need to actually read each one and reply with the truth. In most cases the truth looks like this: This machine, No distributed training, no, no, no, all, fp16
.
You might also want to install xformers
or bitsandbytes
.
# Installing xformers
# Use the same command just replace 'xformers' with any other package you may need.
python -m pip install --use-pep517 xformers
# Installing bitsandbytes for windows
python -m pip install --use-pep517 bitsandbytes --index-url=https://jllllll.github.io/bitsandbytes-windows-webui
Pony Training
I'm not going to lie, it is a bit complicated to explain everything. But here is my best attempt going through some "basic" stuff and almost all lines in order.
Download Pony in Diffusers Format
I'm using the diffusers version for training I converted, you can download it using git
.
git clone https://huggingface.co/k4d3/ponydiffusers
Sample Prompt File
A sample prompt file is used during training to sample images. A sample prompt for example might look like this for Pony:
# anthro female kindred
score_9, score_8_up, score_7_up, score_6_up, rating_explicit, source_furry, solo, female anthro kindred, mask, presenting, white pillow, bedroom, looking at viewer, detailed background, amazing_background, scenery porn, realistic, photo --n low quality, worst quality, blurred background, blurry, simple background --w 1024 --h 1024 --d 1 --l 6.0 --s 40
# anthro female wolf
score_9, score_8_up, score_7_up, score_6_up, rating_explicit, source_furry, solo, anthro female wolf, sexy pose, standing, gray fur, brown fur, canine pussy, black nose, blue eyes, pink areola, pink nipples, detailed background, amazing_background, realistic, photo --n low quality, worst quality, blurred background, blurry, simple background --w 1024 --h 1024 --d 1 --l 6.0 --s 40
Please note that sample prompts should not exceed 77 tokens, you can use Count Tokens in Sample Prompts from /dataset_tools to analyze your prompts.
If you are training with multiple GPUs, ensure that the total number of prompts is divisible by the number of GPUs without any remainder or a card will idle.
Training Commands
Click to reveal training commands.
accelerate launch
For two GPUs:
accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
Single GPU:
accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
And now lets break down a bunch of arguments we can pass to sd-scripts
.
--lowram
If you are running running out of system memory like I do with 2 GPUs and a really fat model that gets loaded into it per GPU, this option will help you save a bit of it and might get you out of OOM hell.
--pretrained_model_name_or_path
The directory containing the checkpoint you just downloaded. I recommend closing the path if you are using a local diffusers model with a /
. You can also specify a .safetensors
or .ckpt
if that is what you have!
--pretrained_model_name_or_path="/ponydiffusers/"
--output_dir
This is where all the saved epochs or steps will be saved, including the last one. If y
--output_dir="/output_dir"
--train_data_dir
The directory containing the dataset. We prepared this earlier together.
--train_data_dir="/training_dir"
--resolution
Always set this to match the model's resolution, which in Pony's case it is 1024x1024. If you can't fit into the VRAM, you can decrease it to 512,512
as a last resort.
--resolution="1024,1024"
--enable_bucket
Creates different buckets by pre-categorizing images with different aspect ratios into different buckets. This technique helps to avoid issues like unnatural crops that are common when models are trained to produce square images. This allows the creation of batches where every item has the same size, but the image size of batches may differ.
--min_bucket_reso
and --max_bucket_reso
Specifies the minimum and maximum resolutions used by the buckets. These values are ignored if --bucket_no_upscale
is set.
--min_bucket_reso=256 --max_bucket_reso=1024
--network_alpha
Specifies how many of the trained Network Ranks are allowed to alter the base model.
--network_alpha=4
--save_model_as
You can use this to specify either ckpt
or safetensors
for the file format.
--save_model_as="safetensors"
--network_module
Specifies which network module you are going to train.
--network_module="lycoris.kohya"
--network_args
The arguments passed down to the network.
--network_args \
"use_reentrant=False" \
"preset=full" \
"conv_dim=256" \
"conv_alpha=4" \
"use_tucker=False" \
"use_scalar=False" \
"rank_dropout_scale=False" \
"algo=locon" \
"train_norm=False" \
"block_dims=8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8" \
"block_alphas=0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625" \
Let's break it down!
preset
The Preset/config system added to LyCORIS for more fine-grained control.
full
- default preset, train all the layers in the UNet and CLIP.
full-lin
full
but skip convolutional layers.
attn-mlp
- "kohya preset", train all the transformer block.
attn-only
- only attention layer will be trained, lot of papers only do training on attn layer.
unet-transformer-only
- as same as kohya_ss/sd_scripts with disabled TE, or, attn-mlp preset with train_unet_only enabled.
unet-convblock-only
- only ResBlock, UpSample, DownSample will be trained.
conv_dim
and conv_alpha
The convolution dimensions are related to the rank of the convolution in the model, adjusting this value can have a significant impact and lowering it affected the aesthetic differences between different LoRA samples. and an alpha value of 128
was used for training a specific character's face while Kohaku recommended to set this to 1
for both LoCon and LoHa.
conv_block_dims = [conv_dim] * num_total_blocks
conv_block_alphas = [conv_alpha] * num_total_blocks
module_dropout
and dropout
and rank_dropout
rank_dropout
is a form of dropout, which is a regularization technique used in neural networks to prevent overfitting and improve generalization. However, unlike traditional dropout which randomly sets a proportion of inputs to zero, rank_dropout
operates on the rank of the input tensor lx
. First a binary mask is created with the same rank as lx
with each element set to True
with probability 1 - rank_dropout
and False
otherwise. Then the mask
is applied to lx
to randomly set some of its elements to zero. After applying the dropout, a scaling factor is applied to lx
to compensate for the dropped out elements. This is done to ensure that the expected sum of lx
remains the same before and after dropout. The scaling factor is 1.0 / (1.0 - self.rank_dropout)
.
Itβs called βrankβ dropout because it operates on the rank of the input tensor, rather than its individual elements. This can be particularly useful in tasks where the rank of the input is important.
If rank_dropout
is set to 0
, it means that no dropout is applied to the rank of the input tensor lx
. All elements of the mask would be set to True
and when the mask gets applied to lx
all of it's elements would be retained and when the scaling factor is applied after dropout it's value would just equal self.scale
because 1.0 / (1.0 - 0)
is 1
. Basically, setting this to 0
effectively disables the dropout mechanism but it will still do some meaningless calculations, and you can't set it to None, so if you really want to disable dropouts simply don't specify them! π
def forward(self, x):
org_forwarded = self.org_forward(x)
# module dropout
if self.module_dropout is not None and self.training:
if torch.rand(1) < self.module_dropout:
return org_forwarded
lx = self.lora_down(x)
# normal dropout
if self.dropout is not None and self.training:
lx = torch.nn.functional.dropout(lx, p=self.dropout)
# rank dropout
if self.rank_dropout is not None and self.training:
mask = torch.rand((lx.size(0), self.lora_dim), device=lx.device) > self.rank_dropout
if len(lx.size()) == 3:
mask = mask.unsqueeze(1)
elif len(lx.size()) == 4:
mask = mask.unsqueeze(-1).unsqueeze(-1)
lx = lx * mask
scale = self.scale * (1.0 / (1.0 - self.rank_dropout))
else:
scale = self.scale
lx = self.lora_up(lx)
return org_forwarded + lx * self.multiplier * scale
The network you are training needs to support it though! See PR#545 for more details.
use_tucker
Can be used for all but (IA)^3
and native fine-tuning.
Tucker decomposition is a method in mathematics that decomposes a tensor into a set of matrices and one small core tensor reducing the computational complexity and memory requirements of the model. It is used in various LyCORIS modules on various blocks. In LoCon for example, if use_tucker
is True
and the kernel size k_size
is not (1, 1)
, then the convolution operation is decomposed into three separate operations.
- A 1x1 convolution that reduces the number of channels from
in_dim
tolora_dim
. - A convolution with the original kernel size
k_size
, stridestride
, and paddingpadding
, but with a reduced number of channelslora_dim
. - A 1x1 convolution that increases the number of channels back from
lora_dim
toout_dim
.
If use_tucker
is False
or not set, or if the kernel size k_size is (1, 1)
, then a standard convolution operation is performed with the original kernel size, stride, and padding, and the number of channels is reduced from in_dim
to lora_dim
.
use_scalar
An additional learned parameter that scales the contribution of the low-rank weights before they are added to the original weights. This scalar can control the extent to which the low-rank adaptation modifies the original weights. By training this scalar, the model can learn the optimal balance between preserving the original pre-trained weights and allowing for low-rank adaptation.
if use_scalar:
self.scalar = nn.Parameter(torch.tensor(0.0))
else:
self.scalar = torch.tensor(1.0)
rank_dropout_scale
A boolean flag that determines whether to scale the dropout mask to have an average value of 1
or not. This can be useful in certain situations to maintain the scale of the tensor after dropout is applied.
def forward(self, orig_weight, org_bias, new_weight, new_bias, *args, **kwargs):
device = self.oft_blocks.device
if self.rank_dropout and self.training:
drop = (torch.rand(self.oft_blocks, device=device) < self.rank_dropout).to(
self.oft_blocks.dtype
)
if self.rank_dropout_scale:
drop /= drop.mean()
else:
drop = 1
algo
The LyCORIS algorithm used, you can find a list of the implemented algorithms and an explanation of them, with a demo you can also dig into the research paper.
train_norm
Controls whether to train normalization layers used by all algorithms except (IA)^3
or not.
block_dims
Specify the rank of each block, it takes exactly 25 numbers, that is why this line is so long.
block_alphas
Specifies the alpha of each block, this too also takes 25 numbers if you don't specify it network_alpha
will be used instead for the value.
That concludes the network_args
.
--network_dropout
This float controls the drop of neurons out of training every step, 0
or None
is default behavior (no dropout), 1 would drop all neurons. Using weight_decompose=True
will ignore network_dropout
and only rank and module dropout will be applied.
--network_dropout=0 \
--lr_scheduler
A learning rate scheduler in PyTorch is a tool that adjusts the learning rate during the training process. Itβs used to modulate the learning rate in response to how the model is performing, which can lead to increased performance and reduced training time.
Possible values: linear
, cosine
, cosine_with_restarts
, polynomial
, constant
(default), constant_with_warmup
, adafactor
Note, adafactor
scheduler can only be used with the adafactor
optimizer!
--lr_scheduler="cosine" \
--lr_scheduler_num_cycles
Number of restarts for cosine scheduler with restarts. It isn't used by any other scheduler.
--lr_scheduler_num_cycles=1 \
--learning_rate
and --unet_lr
and --text_encoder_lr
The learning rate determines how much the weights of the network are updated in response to the estimated error each time the weights are updated. If the learning rate is too large, the weights may overshoot the optimal solution. If itβs too small, the weights may get stuck in a suboptimal solution.
For AdamW the optimal LR seems to be 0.0001
or 1e-4
if you want to impress your friends.
--learning_rate=0.0001 --unet_lr=0.0001 --text_encoder_lr=0.0001
--network_dim
The Network Rank (Dimension) is responsible for how many features your LoRA will be training. It is in a close relation with Network Alpha and the Unet + TE learning rates and of course the quality of your dataset. Personal experimentation with these values is strongly recommended.
--network_dim=8
--output_name
Specify the output name excluding the file extension.
WARNING: If for some reason this is ever left empty your last epoch won't be saved!
--output_name="last"
--scale_weight_norms
Max-norm regularization is a technique that constrains the norm of the incoming weight vector at each hidden unit to be upper bounded by a fixed constant. It prevents the weights from growing too large and helps improve the performance of stochastic gradient descent training of deep neural nets.
Dropout affects the network architecture without changing the weights, while Max-Norm Regularization directly modifies the weights of the network. Both techniques are used to prevent overfitting and improve the generalization of the model. You can learn more about both in this research paper.
--scale_weight_norms=1.0
--max_grad_norm
Also known as Gradient Clipping, if you notice that gradients are exploding during training (loss becomes NaN or very large), consider adjusting the --max_grad_norm
parameter, it operates on the gradients during the backpropagation process, while --scale_weight_norms
operates on the weights of the neural network. This allows them to complement each other and provide a more robust approach to stabilizing the learning process and improving model performance.
--max_grad_norm=1.0
--no_half_vae
Disables mixed precision for the SDXL VAE and sets it to float32
. Very useful if you don't like NaNs.
--save_every_n_epochs
and --save_last_n_epochs
or --save_every_n_steps
and --save_last_n_steps
--save_every_n_steps
and--save_every_n_epochs
: A LoRA file will be created at each n-th step or epoch specified here.--save_last_n_steps
and--save_last_n_epochs
: Discards every saved file except for the lastn
you specify here.
Learning will always end with what you specify in --max_train_epochs
or --max_train_steps
.
--save_every_n_epochs=50
--mixed_precision
β οΈ
--mixed_precision="fp16"
--save_precision
β οΈ
--save_precision="fp16"
--caption_extension
β οΈ
--caption_extension=".txt"
--cache_latents
and --cache_latents_to_disk
β οΈ
--cache_latents --cache_latents_to_disk
--optimizer_type
The default optimizer is AdamW
and there are a bunch of them added every month or so, therefore I'm not listing them all, you can find the list if you really want, but AdamW
is the best as of this writing so we use that!
--optimizer_type="AdamW"
--dataset_repeats
Repeats the dataset when training with captions, by default it is set to 1
so we'll set this to 0
with:
--dataset_repeats=0
--max_train_steps
Specify the number of steps or epochs to train. If both --max_train_steps
and --max_train_epochs
are specified, the number of epochs takes precedence.
--max_train_steps=400
--shuffle_caption
Shuffles the captions set by --caption_separator
, it is a comma ,
by default which will work perfectly for our case since our captions look like this:
rating_questionable, 5 fingers, anthro, bent over, big breasts, blue eyes, blue hair, breasts, butt, claws, curved horn, female, finger claws, fingers, fur, hair, huge breasts, looking at viewer, looking back, looking back at viewer, nipples, nude, pink body, pink hair, pink nipples, rear view, solo, tail, tail tuft, tuft, by lunarii, by x-leon-x, mythology, krystal (darkmaster781), dragon, scalie, wickerbeast, The image showcases a pink-scaled wickerbeast a furred dragon creature with blue eyes., She has large breasts and a thick tail., Her blue and pink horns are curved and pointy and she has a slight smiling expression on her face., Her scales are shiny and she has a blue and pink pattern on her body., Her hair is a mix of pink and blue., She is looking back at the viewer with a curious expression., She has a slight blush.,
As you can tell, I have separated the caption part not just the tags with a ,
to make sure everything gets shuffled. I'm at this point pretty certain this is beneficial especially when your caption file contains more than 77 tokens.
NOTE: --cache_text_encoder_outputs
and --cache_text_encoder_outputs_to_disk
can't be used together with --shuffle_caption
. Both of these aim to reduce VRAM usage, you will need to decide between these yourself!
--sdpa
or --xformers
or --mem_eff_attn
The choice between --xformers
or --mem_eff_attn
and --spda
will depend on your GPU. You can benchmark it by repeating a training with them!
--multires_noise_iterations
and --multires_noise_discount
β οΈ
--multires_noise_iterations=10 --multires_noise_discount=0.1
--sample_prompts
and --sample_sampler
and --sample_every_n_steps
You have the option of generating images during training so you can check the progress, the argument let's you pick between different samplers, by default it is on ddim
, so you better change it!
You can also use --sample_every_n_epochs
instead which will take precedence over steps. The k_
prefix means karras and the _a
suffix means ancestral.
--sample_prompts=/training_dir/sample-prompts.txt --sample_sampler="euler_a" --sample_every_n_steps=100
My recommendation for Pony is to use euler_a
for toony and for realistic k_dpm_2
.
Your sampler options include the following:
ddim, pndm, lms, euler, euler_a, heun, dpm_2, dpm_2_a, dpmsolver, dpmsolver++, dpmsingle, k_lms, k_euler, k_euler_a, k_dpm_2, k_dpm_2_a
So, the whole thing would look something like this:
accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" \
--lowram \
--pretrained_model_name_or_path="/ponydiffusers/" \
--train_data_dir="/training_dir" \
--resolution="1024,1024" \
--output_dir="/output_dir" \
--enable_bucket \
--min_bucket_reso=256 \
--max_bucket_reso=1024 \
--network_alpha=4 \
--save_model_as="safetensors" \
--network_module="lycoris.kohya" \
--network_args \
"preset=full" \
"conv_dim=256" \
"conv_alpha=4" \
"use_tucker=False" \
"use_scalar=False" \
"rank_dropout_scale=False" \
"algo=locon" \
"train_norm=False" \
"block_dims=8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8" \
"block_alphas=0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625" \
--network_dropout=0 \
--lr_scheduler="cosine" \
--learning_rate=0.0001 \
--unet_lr=0.0001 \
--text_encoder_lr=0.0001 \
--network_dim=8 \
--output_name="yifftoolkit" \
--scale_weight_norms=1 \
--no_half_vae \
--save_every_n_epochs=50 \
--mixed_precision="fp16" \
--save_precision="fp16" \
--caption_extension=".txt" \
--cache_latents \
--cache_latents_to_disk \
--optimizer_type="AdamW" \
--max_grad_norm=1 \
--keep_tokens=1 \
--max_data_loader_n_workers=8 \
--bucket_reso_steps=32 \
--multires_noise_iterations=10 \
--multires_noise_discount=0.1 \
--log_prefix=xl-locon \
--gradient_accumulation_steps=12 \
--gradient_checkpointing \
--train_batch_size=8 \
--dataset_repeats=0 \
--max_train_steps=400 \
--shuffle_caption \
--sdpa \
--sample_prompts=/training_dir/sample-prompts.txt \
--sample_sampler="euler_a" \
--sample_every_n_steps=100
Embeddings for 1.5 and SDXL
Embeddings in Stable Diffusion are high-dimensional representations of input data, such as images or text, that capture their essential features and relationships. These embeddings are used to guide the diffusion process, enabling the model to generate outputs that closely match the desired characteristics specified in the input.
You can find in the /embeddings
folder a whole bunch of them I collected for SD 1.5 that I later converted with this tool for SDXL.
ComfyUI Walkthrough any%
β οΈ Coming next year! β οΈ
AnimateDiff for Masochists
β οΈ Coming in 2026! β οΈ
Stable Cascade Furry Bible
Resonance Cascade
π
SDXL Furry Bible
Some Common Knowledge Stuff
Resolution Lora is a nice thing to have, it will help with consistency. For SDXL it is just a LoRA you can load in and it will do its magic. No need for a custom node or extension in this case.
SeaArt Furry
SeaArt's furry model sadly has its cons not just pros, yes it might come with artist knowledge bundled, but it seems to have trouble doing more than one character or everyone is bad at prompting, oh and it uses raw e621 tags, which just means you have to use underscores _
instead of spaces
inside the tags.
β οΈ TODO: Prompting tips.
Pony Diffusion V6
Requirements
Download the model and load it in to whatever you use to generate models.
Positive Prompt Stuff
score_9, score_8_up, score_7_up, score_6_up, rating_explicit, source_furry,
I just assumed you wanted explicit and furry, you can also set the rating to rating_safe
or rating_questionable
and the source to source_anime
, source_cartoon
, source_pony
, source_rule34
and optionally mix them however you'd like. Its your life! score_9
is an interesting tag, the model seems to have put all it's "artsy" knowledge. You might want to check if it is for your taste. The other interesting tag is score_5_up
which seems to have learned a little bit of everything regarding quality while score_4_up
seems to be at the bottom of the autism spectrum regarding art, I do not recommend using it, but you can do whatever you want!
You can talk to Pony in three ways, use tags only, tags are neat, but you can also just type in
The background is of full white marble towers in greek architecture style and a castle.
and use natural language to the fullest extent, but the best way is to mix it both, its actually recommended since the score tags by definition are tags, and you need to use them! There are also artist styles that seeped into some random tokens during training, there is a community effort by some weebs to sort them here.
Other nice words to have in the box depending on your mood:
detailed background, amazing_background, scenery porn
Other types of backgrounds include:
simple background, abstract background, spiral background, geometric background, heart background, gradient background, monotone background, pattern background, dotted background, stripped background, textured background, blurred background
After simple background
you can also define a color for the background like white background
to get a simple white background.
For the character portrayal you can set many different types:
three-quarter view, full-length portrait, headshot portrait, bust portrait, half-length portrait, torso shot
Its a good thing to describe your subject or subjects start with solo
or duo
or maybe trio, group
, and then finally start describing your character in an interesting situation.
Negative Prompt Stuff
β οΈ
How to Prompt Female Anthro Lions
anthro β οΈ?
Pony Diffusion V6 LoRAs
All LoRAs listed here are actually LyCORIS with the exception of blue_frost
which is just a regular LoRA. This might be important in case the software you use makes you put them in separate folders or if you are generating from a cute Python script.
Concept Loras
bdsm-v1e400
blue_frost
A bit of an experiment trying to make generating kitsch winter scenes easier. Originally trained for base SDXL, but it seems to work with PonyXL just fine. If you can call kitsch fine, anyway..
cervine_penis-v1e400
non-euclidean_sex-v1e400
space-v1e500
// Keywords:
by hubble
by jwst
// Example Positive Prompts:
by jwst, a galaxy, photo
by jwst, a red and blue galaxy
by hubble, a galaxy, photo
// Negative Prompt:
cropped,
blurry, wtf, old art, where is your god now, abstract background, simple background, cropped
spacengine-v1e500
// Keyword
by spaceengine
// Example Prompt:
by spaceengine, a planet, black background
Artist/Style LoRAs
blp-v1e400
Replicate blp's unique style of AI art without employing 40 different custom nodes to alter sigmas and noise injection. I recommend you set your CFG to 6
and use DPM++ 2M Karras
for the sampler and scheduler for a more realistic look or you can use Euler a
for a more cartoony/dreamy generation with with a low CFG of 6
.
There have been reports that if you use this LoRA with a negative weight of -0.5
your generations will have a slight sepia tone.
blp,
// Recommended:
detailed background, amazing_background, scenery porn, feral,
butterchalk-v3e400
I'm not into young anthro
I only trained this one for you, you hentai baka! ^_^
cecily_lin-v1e37
I'm honestly not familiar with this artist, I just scraped their art and let sd-scripts go wild.
chunie-v1e5
Everyone loves Chunie. πΉ
cooliehigh-v1e45
Again, I'm really uncultured when it comes to furry artists.
dagasi-v1e134
Even I heard about this one!
darkgem-v1e4
Quality digital painting style. Some people don't like it.
I recommend first an Euler a
with 40
steps, CFG set to 11
at 1024x1024 resolution and then a hi-res pass at 1536x1536 with DPM++ 2M Karras
at 60 steps with denoise set at 0.69 for the highest darkgem. Please only use darkgem
if you want gems to appear in the scene or maybe your character will end up holding a dark red gem
.
himari-v1e400
A tiny dumb LoRA trained on 4 images by @147Penguinmw. The keyword is by himari
but you probably don't need to use it!
// Positive Prompt Example
score_9, score_8_up, score_7_up, score_6_up, source_furry, rating_explicit, on back, sexy pose, full-length portrait, pussy, solo, reptile, scalie, anthro female lizard, scales, blush, blue eyes, white body, blue body, plant, blue scales, white scales, detailed background, looking at viewer, furniture, digital media \(artwork\), This digital artwork image presents a solo anthropomorphic female reptile specifically a lizard with a white body adorned with detailed blue scales.,
furry_sticker-v1e250
Generate an infinite amount of furry stickers for your infinite amount of telegram accounts!
// Positive prompt:
furry sticker, simple background, black background, white outline,
// Negative prompt:
abstract background, detailed background, amazing_background, scenery porn,
goronic-v1e1
greg_rutkowski-v1e400
hamgas-v1e400
honovy-v1e4
jinxit-v1e10
kame_3-v1e80
kenket-v1e4
louart-v1e10
realistic-v4e400
// Positive prompt:
realistic, photo, detailed background, amazing_background, scenery porn,
// Negative prompt:
abstract background, simple background
My take on photorealistic furries. Highly experimental and extremely fun!
I recommend you don't try anything but a CFG of 6
and DPM++ 2M Karras
.
You can combo this with the RetouchPhoto LoRA for even more research. π
skecchiart-v1e134
spectrumshift-v1e400
squishy-v1e10
whisperingfornothing-v1e58
wjs07-v1e200
wolfy-nail-v1e400
woolrool-v1e4
Character LoRAs
arielsatyr-v1e400
amalia-v2e400
Some loli cat girl. Enjoy yourself!
amicus-v1e200
Gay space wolf from a visual novel everyone wants me to play.
auroth-v1e250
A dragon or wyvern thing from DOTA2
blaidd-v1e400
Half-wolf Blaidd! Bestest boy of Elden Ring! He's a very good boy! Can be a naughty boy though as well, if you like..
martlet-v1e200
ramona-v1e400
tibetan-v2e500
veemon-v1e400
hoodwink-v1e400
jayjay-v1e400
foxparks-v2e134
lovander-v3e10
skiltaire-v1e400
chillet-v3e10
maliketh-v1e1
Second best boy of Elden Ring, it took me 7 tries the first time, so this is my form of payback!
// Positive prompt:
male, anthro, maliketh \(elden ring\), white fur, white hair, head armor, red canine genitalia, knot,
// NLP version:
anthro male maliketh \(elden ring\) with white fur and white hair wearing head armor, He has a red canine genitalia with a knotty base and fluffy tail, He has claws and monotone fur with a monotone body,