--- license: other datasets: - Mitsua/vroid-image-dataset-lite pipeline_tag: text-to-image --- # Model Card for VRoid Diffusion This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images. - Text Encoder is from [OpenCLIP ViT-H/14](https://github.com/mlfoundations/open_clip), MIT License, Training Data : LAION-2B - VAE is from [Mitsua Diffusion One](https://huggingface.co/Mitsua/mitsua-diffusion-one), Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed - U-Net is trained from scratch using full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications. - VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions. ## Model Details - `vroid_diffusion_test.safetensors` - base variant. - `vroid_diffusion_test_invert_red_blue.safetensors` - `red` and `blue` in the caption is swapped. - `pink` and `skyblue` in the caption is swapped. - `vroid_diffusion_test_monochrome.safetensors` - all training images are converted to grayscale. ### Model Description - **Developed by:** Abstract Engine. - **License:** Mitsua Open RAIL-M License. ## Uses ### Direct Use Text-to-Image generation for research and educational purposes. ### Out-of-Scope Use Any deployed use case of the model. ## Training Details ### Training Data We use full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.