metadata

title: Style ControlNet
emoji: ❅
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 3.30.0
app_file: app.py
pinned: true
license: openrail

ControlStyle

Proof of concept for controlling Stable Diffusion image style using a ControlNet.

prompt: "beautiful woman with blue eyes", controlnet_prompt: "1girl, blue eyes"

prompt and controlnet_prompt: "best quality, masterpiece, Dark hair, dark eyes, upper body, sun flare, outdoors, mountain, valley, sky. clouds, smiling"

controlnet_conditioning_scale increments by 0.1 from 0 to 1, left to right.

Try Style Controlnet with A1111 WebUI

Quick start: download the anime controlnets here,

Root folder has controlnets in Diffusers format, A1111_weights has controlnets for use with A1111 Webui Controlnet Extension. More details at the HF repo page.

Quick Start Training

For a basic training example with HF Accelerate, run the following

pip install -r requirements.txt
python quickstart_train.py

By default, the script will download pipeline weights and an image dataset from HF Hub. The base stable diffusion checkpoint and controlnet weights can either be in HF diffusers format or the original stable diffusion pytorch-lightning format (inferred based on whether destination is file or not)

Use the convert_state_dict.sh to convert the trained controlnet state dict from diffusers format to one compatible with the A1111 controlnet extension

Style Controlnet Web UI

Launch the Web UI locally with

python app.py

(My Hf Spaces below are currently out of date, I will fix them soon once I have time)

Try the WebUI hosted on HF Spaces at https://huggingface.co/spaces/lint/anime_controlnet

WebUI also supports basic training

ControlNet for Style

Lvmin introduced the Controlnet to use a cloned Stable Diffusion UNet to introduce external conditioning, such as body poses/sketch lines, to guide Stable Diffusion generation with fantastic results.

I thought his approach might also work for introducing different styles (i.e. add anime style), in guiding the image generation process. Unlike the original controlnets, I initialized the controlnet weights from a distinct UNet (andite/anything-v4.5), and predominantly trained without any controlnet conditioning image on a synthetic anime dataset (lint/anybooru) distinct from the base model. Then the main controlnet weights were frozen, the input hint block weights added back in and trained on the same dataset using canny image processing to generate the controlnet conditioning image.

I originally trained the anime style controlnets without any controlnet conditioning image, so that the controlnet would focus on adding anime style rather than structure to the image. I have these weights saved at https://huggingface.co/lint/anime_styler/tree/main/A1111_webui_weights, however they need to be used with my fork of the controlnet extension, which has very minor changes allow the user to load the controlnet without the input hint block weights, and pass None as a valid controlnet "conditioning".

Recently I added back in the input hint processing module, and trained only the controlnet input hint blocks on canny image generation. So the models in this repository are now just like regular controlnets, except for having a different initialization and training process. They can be used just like a regular controlnet, but the vast majority of the weights were trained on adding anime style, with just the input hint blocks trained on using the controlnet conditioning image. Though it seems to work alright from my limited testing so far, expect the canny image guidance to be weak so combine with original canny image controlnet as needed.

Since the main controlnet weights were trained without any canny image conditioning, they can (and were intended to be) used without any controlnet conditioning image. However the existing A1111 Controlnet Extension expects the user to always pass a controlnet conditioning image, otherwise it will trigger an error. However you can pass a black square as the "conditioning image", which will add some unexpected random noise to the image due to the input hint block bias weights, however the noise is small enough that the controlnet still appears to "work".