|
--- |
|
license: openrail++ |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
- diffusers |
|
--- |
|
|
|
# AnimeBoysXL v3.0 |
|
|
|
**It takes substantial time and efforts to bake models. If you appreciate my models, I would be grateful if you could support me on [Ko-fi](https://ko-fi.com/koolchh) ☕.** |
|
|
|
## Features |
|
|
|
- ✔️ **Good for inference**: AnimeBoysXL v3.0 is a flexible model which is good at generating images of anime boys and males-only content in a wide range of styles. |
|
- ✔️ **Good for training**: AnimeBoysXL v3.0 is suitable for further training, thanks to its neutral style and ability to recognize a great deal of concepts. Feel free to train your own anime boy model/LoRA from AnimeBoysXL. |
|
|
|
## Inference Guide |
|
|
|
- **Prompt**: Use tag-based prompts to describe your subject. |
|
- Tag ordering matters. It is highly recommended to structure your prompt with the following templates: |
|
``` |
|
1boy, male focus, character name, series name, anything else you'd like to describe, best quality, amazing quality, best aesthetic, absurdres |
|
``` |
|
``` |
|
2boys, male focus, multiple boys, character name(s), series name, anything else you'd like to describe, best quality, amazing quality, best aesthetic, absurdres |
|
``` |
|
- **Negative prompt**: Choose from one of the following two presets. |
|
1. Heavy (*recommended*): |
|
``` |
|
lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts |
|
``` |
|
2. Light: |
|
``` |
|
lowres, jpeg artifacts, worst quality, watermark, blurry, bad aesthetic |
|
``` |
|
- **VAE**: Make sure you're using [SDXL VAE](https://huggingface.co/stabilityai/sdxl-vae/tree/main). |
|
- **Sampling method, sampling steps and CFG scale**: I find **(Euler a, 28, 8.5)** good. You are encouraged to experiment with other settings. |
|
- **Width and height**: **832*1216** for portrait, **1024*1024** for square, and **1216*832** for landscape. |
|
|
|
## 🧨Diffusers Example Usage |
|
|
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline |
|
|
|
pipe = DiffusionPipeline.from_pretrained("Koolchh/AnimeBoysXL-v3.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") |
|
pipe.to("cuda") |
|
|
|
prompt = "1boy, male focus, shirt, solo, looking at viewer, smile, black hair, brown eyes, short hair, best quality, amazing quality, best aesthetic, absurdres" |
|
negative_prompt = "lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts" |
|
|
|
image = pipe( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
width=1024, |
|
height=1024, |
|
guidance_scale=8.5, |
|
num_inference_steps=28 |
|
).images[0] |
|
``` |
|
|
|
## Training Details |
|
|
|
AnimeBoysXL v3.0 is trained from [Pony Diffusion V6 XL](https://civitai.com/models/257749/pony-diffusion-v6-xl), on ~516k images. |
|
|
|
The following tags are attached to the training data to make it easier to steer toward either more aesthetic or more flexible results. |
|
|
|
### Quality tags |
|
|
|
| tag | score | |
|
|-------------------|-----------| |
|
| `best quality` | >= 150 | |
|
| `amazing quality` | [75, 150) | |
|
| `great quality` | [25, 75) | |
|
| `normal quality` | [0, 25) | |
|
| `bad quality` | (-5, 0) | |
|
| `worst quality` | <= -5 | |
|
|
|
### Aesthetic tags |
|
|
|
The aesthetic tags of AnimeBoysXL v3.0 reflect my aesthetic preference. |
|
|
|
| tag | |
|
|---------------------| |
|
| `best aesthetic` | |
|
| `amazing aesthetic` | |
|
| `great aesthetic` | |
|
| `normal aesthetic` | |
|
| `bad aesthetic` | |
|
|
|
### Rating tags |
|
|
|
| tag | rating | |
|
|-----------------|--------------| |
|
| `sfw` | general | |
|
| `slightly nsfw` | sensitive | |
|
| `fairly nsfw` | questionable | |
|
| `very nsfw` | explicit | |
|
|
|
### Year tags |
|
|
|
`year YYYY` where `YYYY` is in the range of [2005, 2023]. |
|
|
|
### Training configurations |
|
|
|
- Hardware: 4 * Nvidia A100 80GB GPUs |
|
- Optimizer: AdaFactor |
|
- Gradient accumulation steps: 8 |
|
- Batch size: 4 * 8 * 4 = 128 |
|
- Learning rates: |
|
- 8e-6 for U-Net |
|
- 5.2e-6 for text encoder 1 (CLIP ViT-L) |
|
- 4.8e-6 for text encoder 2 (OpenCLIP ViT-bigG) |
|
- Learning rate schedule: constant with 250 warmup steps |
|
- Mixed precision training type: FP16 |
|
- Epochs: 40 |
|
|
|
### Changes from v2.0 |
|
- Change the base model from Stable Diffusion XL Base 1.0 to Pony Diffusion V6 XL. |
|
- Revamp the dataset's aesthetic tags based on the developer's preference. |
|
- Update the criterion of quality tags. |
|
- Use FP16 mixed-precision training. |
|
- Train for more epochs. |
|
|
|
## Special thanks |
|
|
|
**chefFromSpace** for his assistance with the showcase images. |
|
|
|
## License |
|
|
|
Since AnimeBoysXL v3.0 is a derivative model of [Pony Diffusion V6 XL](https://civitai.com/models/257749/pony-diffusion-v6-xl) by PurpleSmartAI, it has a different license from the previous versions. Please read their license before using the model. |