metadata
license: openrail++
tags:
- text-to-image
- stable-diffusion
- diffusers
AnimeBoysXL v3.0
It takes substantial time and efforts to bake models. If you appreciate my models, I would be grateful if you could support me on Ko-fi ☕.
Features
- ✔️ Good for inference: AnimeBoysXL v3.0 is a flexible model which is good at generating images of anime boys and males-only content in a wide range of styles.
- ✔️ Good for training: AnimeBoysXL v3.0 is suitable for further training, thanks to its neutral style and ability to recognize a great deal of concepts. Feel free to train your own anime boy model/LoRA from AnimeBoysXL.
- ❌ AnimeBoysXL v3.0 is not optimized for creating anime girls. Please consider using other models for that purpose.
Inference Guide
- Prompt: Use tag-based prompts to describe your subject.
- Tag ordering matters. It is highly recommended to structure your prompt with the following templates:
1boy, male focus, character name, series name, anything else you'd like to describe
2boys, male focus, multiple boys, character name(s), series name, anything else you'd like to describe
- Append
to the prompt to improve image quality., best quality, amazing quality, best aesthetic, amazing aesthetic, absurdres
- (Optional) Append
to the prompt to shift the output toward the prevalent style of that year., year YYYY
YYYY
is a 4 digit year, e.g., year 2023
- Tag ordering matters. It is highly recommended to structure your prompt with the following templates:
- Negative prompt: Choose from one of the following two presets.
- Heavy (recommended):
lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts
- Light:
lowres, jpeg artifacts, worst quality, watermark, blurry, bad aesthetic
- VAE: Make sure you're using SDXL VAE.
- Sampling method, sampling steps and CFG scale: I find (Euler a, 28, 5) good. You are encouraged to experiment with other settings.
- Width and height: 832*1216 for portrait, 1024*1024 for square, and 1216*832 for landscape.
🧨Diffusers Example Usage
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("Koolchh/AnimeBoysXL-v3.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")
prompt = "1boy, male focus, shirt, solo, looking at viewer, smile, black hair, brown eyes, short hair, best quality, amazing quality, best aesthetic, amazing aesthetic, absurdres"
negative_prompt = "lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
guidance_scale=5,
num_inference_steps=28
).images[0]
Training Details
AnimeBoysXL v3.0 is trained from Pony Diffusion V6 XL, on ~516k images.
The following tags are attached to the training data to make it easier to steer toward either more aesthetic or more flexible results.
Quality tags
tag | score |
---|---|
best quality |
>= 150 |
amazing quality |
[75, 150) |
great quality |
[25, 75) |
normal quality |
[0, 25) |
bad quality |
(-5, 0) |
worst quality |
<= -5 |
Aesthetic tags
tag |
---|
best aesthetic |
amazing aesthetic |
great aesthetic |
normal aesthetic |
bad aesthetic |
Rating tags
tag | rating |
---|---|
sfw |
general |
slightly nsfw |
sensitive |
fairly nsfw |
questionable |
very nsfw |
explicit |
Year tags
year YYYY
where YYYY
is in the range of [2005, 2023].
Training configurations
- Hardware: 4 * Nvidia A100 80GB GPUs
- Optimizer: AdaFactor
- Gradient accumulation steps: 8
- Batch size: 4 * 8 * 4 = 128
- Learning rates:
- 8e-6 for U-Net
- 5.2e-6 for text encoder 1 (CLIP ViT-L)
- 4.8e-6 for text encoder 2 (OpenCLIP ViT-bigG)
- Learning rate schedule: constant with 250 warmup steps
- Mixed precision training type: FP16
- Epochs: 40
Changes from v2.0
- Change the base model from Stable Diffusion XL Base 1.0 to Pony Diffusion V6 XL.
- Revamp the dataset's aesthetic tag based on the developer's preference.
- Update quality score and aesthetic score criteria.
- Use FP16 mixed-precision training.
- Train for more epochs.