Mitsua
/

vroid-diffusion-test

StableDiffusionPipeline

Inference Endpoints

Model card Files Files and versions Community

vroid-diffusion-test / README.md

Mitsua's picture

Update README.md

13dc11c over 1 year ago

|

1.89 kB

metadata

license: other
datasets:
  - Mitsua/vroid-image-dataset-lite
pipeline_tag: text-to-image

Model Card for VRoid Diffusion

This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images.

Text Encoder is from OpenCLIP ViT-H/14, MIT License, Training Data : LAION-2B
VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.

Model Details

vroid_diffusion_test.safetensors
- base variant.
vroid_diffusion_test_invert_red_blue.safetensors
- red and blue in the caption is swapped.
- pink and skyblue in the caption is swapped.
vroid_diffusion_test_monochrome.safetensors
- all training images are converted to grayscale.

Model Variant

VRoid Diffusion Unconditional
- This is unconditional image generator without CLIP.

Model Description

Developed by: Abstract Engine.
License: Mitsua Open RAIL-M License.

Uses

Direct Use

Text-to-Image generation for research and educational purposes.

Out-of-Scope Use

Any deployed use case of the model.

Training Details

Trained resolution : 256x256
Batch Size : 48
Steps : 45k
LR : 1e-5 with warmup 1000 steps

Training Data

We use full version of VRoid Image Dataset Lite with some modifications.