AnimeBoysXL-v3.0 / README.md

Update README.md

08ea1aa verified 10 months ago

4.93 kB

	---
	license: openrail++
	tags:
	- text-to-image
	- stable-diffusion
	- diffusers
	---

	# AnimeBoysXL v3.0

	It takes substantial time and efforts to bake models. If you appreciate my models, I would be grateful if you could support me on [Ko-fi](https://ko-fi.com/koolchh) ☕.

	## Features

	- ✔️ Good for inference: AnimeBoysXL v3.0 is a flexible model which is good at generating images of anime boys and males-only content in a wide range of styles.
	- ✔️ Good for training: AnimeBoysXL v3.0 is suitable for further training, thanks to its neutral style and ability to recognize a great deal of concepts. Feel free to train your own anime boy model/LoRA from AnimeBoysXL.

	## Inference Guide

	- Prompt: Use tag-based prompts to describe your subject.
	- Tag ordering matters. It is highly recommended to structure your prompt with the following templates:
	```
	1boy, male focus, character name, series name, anything else you'd like to describe, best quality, amazing quality, best aesthetic, absurdres
	```
	```
	2boys, male focus, multiple boys, character name(s), series name, anything else you'd like to describe, best quality, amazing quality, best aesthetic, absurdres
	```
	- Negative prompt: Choose from one of the following two presets.
	1. Heavy (recommended):
	```
	lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts
	```
	2. Light:
	```
	lowres, jpeg artifacts, worst quality, watermark, blurry, bad aesthetic
	```
	- VAE: Make sure you're using [SDXL VAE](https://huggingface.co/stabilityai/sdxl-vae/tree/main).
	- Sampling method, sampling steps and CFG scale: I find (Euler a, 28, 8.5) good. You are encouraged to experiment with other settings.
	- Width and height: *8321216 for portrait, 10241024* for square, and *1216832** for landscape.

	## 🧨Diffusers Example Usage

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained("Koolchh/AnimeBoysXL-v3.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
	pipe.to("cuda")

	prompt = "1boy, male focus, shirt, solo, looking at viewer, smile, black hair, brown eyes, short hair, best quality, amazing quality, best aesthetic, absurdres"
	negative_prompt = "lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts"

	image = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
	width=1024,
	height=1024,
	guidance_scale=8.5,
	num_inference_steps=28
	).images[0]
	```

	## Training Details

	AnimeBoysXL v3.0 is trained from [Pony Diffusion V6 XL](https://civitai.com/models/257749/pony-diffusion-v6-xl), on ~516k images.

	The following tags are attached to the training data to make it easier to steer toward either more aesthetic or more flexible results.

	### Quality tags

	\| tag \| score \|
	\|-------------------\|-----------\|
	\| `best quality` \| >= 150 \|
	\| `amazing quality` \| [75, 150) \|
	\| `great quality` \| [25, 75) \|
	\| `normal quality` \| [0, 25) \|
	\| `bad quality` \| (-5, 0) \|
	\| `worst quality` \| <= -5 \|

	### Aesthetic tags

	The aesthetic tags of AnimeBoysXL v3.0 reflect my aesthetic preference.

	\| tag \|
	\|---------------------\|
	\| `best aesthetic` \|
	\| `amazing aesthetic` \|
	\| `great aesthetic` \|
	\| `normal aesthetic` \|
	\| `bad aesthetic` \|

	### Rating tags

	\| tag \| rating \|
	\|-----------------\|--------------\|
	\| `sfw` \| general \|
	\| `slightly nsfw` \| sensitive \|
	\| `fairly nsfw` \| questionable \|
	\| `very nsfw` \| explicit \|

	### Year tags

	`year YYYY` where `YYYY` is in the range of [2005, 2023].

	### Training configurations

	- Hardware: 4 * Nvidia A100 80GB GPUs
	- Optimizer: AdaFactor
	- Gradient accumulation steps: 8
	- Batch size: 4 * 8 * 4 = 128
	- Learning rates:
	- 8e-6 for U-Net
	- 5.2e-6 for text encoder 1 (CLIP ViT-L)
	- 4.8e-6 for text encoder 2 (OpenCLIP ViT-bigG)
	- Learning rate schedule: constant with 250 warmup steps
	- Mixed precision training type: FP16
	- Epochs: 40

	### Changes from v2.0
	- Change the base model from Stable Diffusion XL Base 1.0 to Pony Diffusion V6 XL.
	- Revamp the dataset's aesthetic tags based on the developer's preference.
	- Update the criterion of quality tags.
	- Use FP16 mixed-precision training.
	- Train for more epochs.

	## Special thanks

	chefFromSpace for his assistance with the showcase images.

	## License

	Since AnimeBoysXL v3.0 is a derivative model of [Pony Diffusion V6 XL](https://civitai.com/models/257749/pony-diffusion-v6-xl) by PurpleSmartAI, it has a different license from the previous versions. Please read their license before using the model.