|
--- |
|
pipeline_tag: text-to-image |
|
license: other |
|
license_name: stable-cascade-nc-community |
|
license_link: LICENSE |
|
--- |
|
|
|
# SoteDiffusion Cascade |
|
|
|
Anime finetune of Stable Cascade. |
|
Currently is in very early state in training. |
|
No commercial use thanks to StabilityAI. |
|
|
|
<style> |
|
.image { |
|
float: left; |
|
margin-left: 10px; |
|
} |
|
</style> |
|
|
|
<table> |
|
<img class="image" src="https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/DVcAEhQr_FarvoLawYpBM.png" width="320"> |
|
<img class="image" src="https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/kNts3NhZogHHqC5JfKRkr.png" width="320"> |
|
</table> |
|
|
|
## Code Example |
|
|
|
```shell |
|
pip install diffusers |
|
``` |
|
|
|
```python |
|
import torch |
|
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline |
|
|
|
prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body," |
|
negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child," |
|
|
|
prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16) |
|
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16) |
|
|
|
prior.enable_model_cpu_offload() |
|
prior_output = prior( |
|
prompt=prompt, |
|
height=1024, |
|
width=1024, |
|
negative_prompt=negative_prompt, |
|
guidance_scale=6.0, |
|
num_images_per_prompt=1, |
|
num_inference_steps=40 |
|
) |
|
|
|
decoder.enable_model_cpu_offload() |
|
decoder_output = decoder( |
|
image_embeddings=prior_output.image_embeddings, |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
guidance_scale=2.0, |
|
output_type="pil", |
|
num_inference_steps=10 |
|
).images[0] |
|
decoder_output.save("cascade.png") |
|
``` |
|
|
|
|
|
## Training Status: |
|
|
|
**GPU used for training**: 1x AMD RX 7900 XTX 24GB |
|
|
|
| dataset name | training done | remaining | |
|
|---|---|---| |
|
| **newest** | 002 | 218 | |
|
| **late** | 002 | 204 | |
|
| **mid** | 002 | 199 | |
|
| **early** | 002 | 053 | |
|
| **oldest** | 002 | 014 | |
|
| **pixiv** | 002 | 072 | |
|
| **visual novel cg** | 002 | 068 | |
|
| **anime wallpaper** | 002 | 011 | |
|
| **Total** | 24 | 839 | |
|
|
|
**Note**: chunks starts from 0 and there are 8000 images per chunk |
|
|
|
|
|
## Dataset: |
|
|
|
**GPU used for captioning**: 1x Intel ARC A770 16GB |
|
**Model used for captioning**: SmilingWolf/wd-v1-4-convnextv2-tagger-v2 |
|
|
|
|
|
| dataset name | total images | total chunk | |
|
|---|---|---| |
|
| **newest** | 1.766.335 | 221 | |
|
| **late** | 1.652.420 | 207 | |
|
| **mid** | 1.609.608 | 202 | |
|
| **early** | 442.368 | 056 | |
|
| **oldest** | 128.311 | 017 | |
|
| **pixiv** | 594.046 | 075 | |
|
| **visual novel cg** | 560.903 | 071 | |
|
| **anime wallpaper** | 106.882 | 014 | |
|
| **Total** | 6.860.873 | 863 | |
|
|
|
**Note**: Smallest size is 1280x600 | 768.000 pixels |
|
|
|
|
|
## Tags: |
|
|
|
``` |
|
aesthetic tags, quality tags, date tags, custom tags, rest of the tags |
|
``` |
|
|
|
### Date: |
|
| tag | date | |
|
|---|---| |
|
| **newest** | 2022 to 2024 | |
|
| **late** | 2019 to 2021 | |
|
| **mid** | 2015 to 2018 | |
|
| **early** | 2011 to 2014 | |
|
| **oldest** | 2005 to 2010 | |
|
|
|
### Aesthetic Tags: |
|
|
|
**Model used**: shadowlilac/aesthetic-shadow |
|
|
|
| score greater than | tag | |
|
|---|---| |
|
| **0.980** | extremely aesthetic | |
|
| **0.900** | very aesthetic | |
|
| **0.750** | aesthetic | |
|
| **0.500** | slightly aesthetic | |
|
| **0.350** | not displeasing | |
|
| **0.250** | not aesthetic | |
|
| **0.125** | slightly displeasing | |
|
| **0.025** | displeasing | |
|
| **rest of them** | very displeasing | |
|
|
|
### Quality Tags: |
|
|
|
**Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth |
|
|
|
|
|
| score greater than | tag | |
|
|---|---| |
|
| **0.980** | best quality | |
|
| **0.900** | high quality | |
|
| **0.750** | great quality | |
|
| **0.500** | medium quality | |
|
| **0.250** | normal quality | |
|
| **0.125** | bad quality | |
|
| **0.025** | low quality | |
|
| **rest of them** | worst quality | |
|
|
|
## Custom Tags: |
|
|
|
| dataset name | custom tag | |
|
|---|---| |
|
| **image boards** | date, | |
|
| **pixiv** | art by Display_Name, | |
|
| **visual novel cg** | Full_VN_Name (short_3_letter_name), visual novel cg, | |
|
| **anime wallpaper** | date, anime wallpaper, | |
|
|
|
## Training Params: |
|
|
|
**Software used**: Kohya SD-Scripts with Stable Cascade branch |
|
**Base model**: KBlueLeaf/Stable-Cascade-FP16-fixed |
|
|
|
### Command: |
|
``` |
|
accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \ |
|
--mixed_precision fp16 \ |
|
--save_precision fp16 \ |
|
--full_fp16 \ |
|
--sdpa \ |
|
--gradient_checkpointing \ |
|
--resolution "1024,1024" \ |
|
--train_batch_size 2 \ |
|
--gradient_accumulation_steps 32 \ |
|
--adaptive_loss_weight \ |
|
--learning_rate 4e-6 \ |
|
--lr_scheduler constant_with_warmup \ |
|
--lr_warmup_steps 100 \ |
|
--optimizer_type adafactor \ |
|
--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \ |
|
--max_grad_norm 0 \ |
|
--token_warmup_min 1 \ |
|
--token_warmup_step 0 \ |
|
--shuffle_caption \ |
|
--caption_dropout_rate 0 \ |
|
--caption_tag_dropout_rate 0 \ |
|
--caption_dropout_every_n_epochs 0 \ |
|
--dataset_repeats 1 \ |
|
--save_state \ |
|
--save_every_n_steps 128 \ |
|
--sample_every_n_steps 32 \ |
|
--max_token_length 225 \ |
|
--max_train_epochs 1 \ |
|
--caption_extension ".txt" \ |
|
--max_data_loader_n_workers 2 \ |
|
--persistent_data_loader_workers \ |
|
--enable_bucket \ |
|
--min_bucket_reso 256 \ |
|
--max_bucket_reso 4096 \ |
|
--bucket_reso_steps 64 \ |
|
--bucket_no_upscale \ |
|
--log_with tensorboard \ |
|
--output_name sotediffusion-sc_3b \ |
|
--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \ |
|
--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \ |
|
--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \ |
|
--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \ |
|
--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \ |
|
--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \ |
|
--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \ |
|
--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \ |
|
--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt |
|
``` |
|
|
|
|
|
## Limitations and Bias |
|
|
|
### Bias |
|
|
|
- This model is intended for anime illustrations. |
|
Realistic capabilites are not tested at all. |
|
- Current version has bias to older anime styles. |
|
|
|
### Limitations |
|
- Can fall back to realistic. |
|
Use "anime illustration" tag to point it into the right direction. |
|
- Far shot eyes are bad thanks to the heavy latent compression. |
|
|