sote-diffusion-cascade_pre-alpha0 / README.md

Update README.md

30bc64a verified 11 months ago

6.6 kB

	---
	pipeline_tag: text-to-image
	license: other
	license_name: stable-cascade-nc-community
	license_link: LICENSE
	---

	# SoteDiffusion Cascade

	Anime finetune of Stable Cascade.
	Currently is in very early state in training.
	No commercial use thanks to StabilityAI.

	<style>
	.image {
	float: left;
	margin-left: 10px;
	}
	</style>

	<table>
	<img class="image" src="https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/DVcAEhQr_FarvoLawYpBM.png" width="320">
	<img class="image" src="https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/kNts3NhZogHHqC5JfKRkr.png" width="320">
	</table>

	## Code Example

	```shell
	pip install diffusers
	```

	```python
	import torch
	from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

	prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
	negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"

	prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16)
	decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16)

	prior.enable_model_cpu_offload()
	prior_output = prior(
	prompt=prompt,
	height=1024,
	width=1024,
	negative_prompt=negative_prompt,
	guidance_scale=6.0,
	num_images_per_prompt=1,
	num_inference_steps=40
	)

	decoder.enable_model_cpu_offload()
	decoder_output = decoder(
	image_embeddings=prior_output.image_embeddings,
	prompt=prompt,
	negative_prompt=negative_prompt,
	guidance_scale=2.0,
	output_type="pil",
	num_inference_steps=10
	).images[0]
	decoder_output.save("cascade.png")
	```


	## Training Status:

	GPU used for training: 1x AMD RX 7900 XTX 24GB

	\| dataset name \| training done \| remaining \|
	\|---\|---\|---\|
	\| newest \| 002 \| 218 \|
	\| late \| 002 \| 204 \|
	\| mid \| 002 \| 199 \|
	\| early \| 002 \| 053 \|
	\| oldest \| 002 \| 014 \|
	\| pixiv \| 002 \| 072 \|
	\| visual novel cg \| 002 \| 068 \|
	\| anime wallpaper \| 002 \| 011 \|
	\| Total \| 24 \| 839 \|

	Note: chunks starts from 0 and there are 8000 images per chunk


	## Dataset:

	GPU used for captioning: 1x Intel ARC A770 16GB
	Model used for captioning: SmilingWolf/wd-v1-4-convnextv2-tagger-v2


	\| dataset name \| total images \| total chunk \|
	\|---\|---\|---\|
	\| newest \| 1.766.335 \| 221 \|
	\| late \| 1.652.420 \| 207 \|
	\| mid \| 1.609.608 \| 202 \|
	\| early \| 442.368 \| 056 \|
	\| oldest \| 128.311 \| 017 \|
	\| pixiv \| 594.046 \| 075 \|
	\| visual novel cg \| 560.903 \| 071 \|
	\| anime wallpaper \| 106.882 \| 014 \|
	\| Total \| 6.860.873 \| 863 \|

	Note: Smallest size is 1280x600 \| 768.000 pixels


	## Tags:

	```
	aesthetic tags, quality tags, date tags, custom tags, rest of the tags
	```

	### Date:
	\| tag \| date \|
	\|---\|---\|
	\| newest \| 2022 to 2024 \|
	\| late \| 2019 to 2021 \|
	\| mid \| 2015 to 2018 \|
	\| early \| 2011 to 2014 \|
	\| oldest \| 2005 to 2010 \|

	### Aesthetic Tags:

	Model used: shadowlilac/aesthetic-shadow

	\| score greater than \| tag \|
	\|---\|---\|
	\| 0.980 \| extremely aesthetic \|
	\| 0.900 \| very aesthetic \|
	\| 0.750 \| aesthetic \|
	\| 0.500 \| slightly aesthetic \|
	\| 0.350 \| not displeasing \|
	\| 0.250 \| not aesthetic \|
	\| 0.125 \| slightly displeasing \|
	\| 0.025 \| displeasing \|
	\| rest of them \| very displeasing \|

	### Quality Tags:

	Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth


	\| score greater than \| tag \|
	\|---\|---\|
	\| 0.980 \| best quality \|
	\| 0.900 \| high quality \|
	\| 0.750 \| great quality \|
	\| 0.500 \| medium quality \|
	\| 0.250 \| normal quality \|
	\| 0.125 \| bad quality \|
	\| 0.025 \| low quality \|
	\| rest of them \| worst quality \|

	## Custom Tags:

	\| dataset name \| custom tag \|
	\|---\|---\|
	\| image boards \| date, \|
	\| pixiv \| art by Display_Name, \|
	\| visual novel cg \| Full_VN_Name (short_3_letter_name), visual novel cg, \|
	\| anime wallpaper \| date, anime wallpaper, \|

	## Training Params:

	Software used: Kohya SD-Scripts with Stable Cascade branch
	Base model: KBlueLeaf/Stable-Cascade-FP16-fixed

	### Command:
	```
	accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
	--mixed_precision fp16 \
	--save_precision fp16 \
	--full_fp16 \
	--sdpa \
	--gradient_checkpointing \
	--resolution "1024,1024" \
	--train_batch_size 2 \
	--gradient_accumulation_steps 32 \
	--adaptive_loss_weight \
	--learning_rate 4e-6 \
	--lr_scheduler constant_with_warmup \
	--lr_warmup_steps 100 \
	--optimizer_type adafactor \
	--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
	--max_grad_norm 0 \
	--token_warmup_min 1 \
	--token_warmup_step 0 \
	--shuffle_caption \
	--caption_dropout_rate 0 \
	--caption_tag_dropout_rate 0 \
	--caption_dropout_every_n_epochs 0 \
	--dataset_repeats 1 \
	--save_state \
	--save_every_n_steps 128 \
	--sample_every_n_steps 32 \
	--max_token_length 225 \
	--max_train_epochs 1 \
	--caption_extension ".txt" \
	--max_data_loader_n_workers 2 \
	--persistent_data_loader_workers \
	--enable_bucket \
	--min_bucket_reso 256 \
	--max_bucket_reso 4096 \
	--bucket_reso_steps 64 \
	--bucket_no_upscale \
	--log_with tensorboard \
	--output_name sotediffusion-sc_3b \
	--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \
	--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \
	--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \
	--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \
	--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \
	--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \
	--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
	--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
	--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt
	```


	## Limitations and Bias

	### Bias

	- This model is intended for anime illustrations.
	Realistic capabilites are not tested at all.
	- Current version has bias to older anime styles.

	### Limitations
	- Can fall back to realistic.
	Use "anime illustration" tag to point it into the right direction.
	- Far shot eyes are bad thanks to the heavy latent compression.