State of the model?
thank you so much for the training the first SD3 controlnet models and integrating support in diffusers!
in https://github.com/huggingface/diffusers/pull/8566#issuecomment-2169316913 you mention the model is "beta". please also mention and update the current state in the readme. should it be release quality yet?
as of now my canny results are rather uncanny:
canny 1024x1024 (low=0.1, high=0.4)
conditioning_scale=0.7
conditioning_scale=1.0
Stable Diffusion 3 in ComfyUI (no controlnet)
Your canny image is kind of sparse. Try another canny image with scale=0.8
prompt: a full-length portrait of a young woman with a pearl earring and a blue head scarf is captured in a close-up shot against a dark backdrop. the woman is facing the viewer, her head turned slightly to the right. her hair is neatly pulled back into a blue head scarf, which is draped over her left shoulder. the scarf is tied at the back with a white collar. her eyes are wide open, and she has a red lip. her mouth is slightly ajar, revealing a hint of teeth. her ears are pierced with a gold earring, and a pearl earring is dangling from her left ear. her right ear is covered by a yellow scarf, which is draped over her left shoulder. the backdrop is a dark brown canvas, providing a stark contrast to the woman's vibrant colors.
sparse maybe, but it is one of the test images of the original controlnet: https://github.com/lllyasviel/ControlNet#controlnet-with-canny-edge
and personally I found dog2
to be one of the best example image for evaluating edge-detection controlnets in my own controlnet trainings as outlined here: https://github.com/lllyasviel/ControlNet/discussions/318#discussioncomment-7176692
this is the result I get without using a controlnet thus it's hard to tell what the part of the controlnet was:
to me it seems rather the controlnet is not yet fully trained(?), that's why I'm asking about the release state. what is the batch sizes, how many samples/epochs?
backlink to github. user kijai seems to get good results too: https://github.com/comfyanonymous/ComfyUI/issues/3734#issuecomment-2186084970
@wanghaofan
I tried with ComfyUI implementation now but still not getting good results with Canny :/ I'm very grateful of your efforts and I want this to work.
Can you please tell me what I'm doing wrong or please provide more examples (ComfyUI workflow is included in the image):
I trained canny controlnets on my own and this result looks to me as if a) the CN didn't yet fully converge or b) the model collapsed at some point. Canny is usually very resilient to bad input.
@GeroldMeisinger hey! unrelated to SD 3 CN
I found your article on training controlnets super insightful and would love to chat / collaborate on training an SDXL CN!
I am building https://glyf.space/
3D rendering powered by SD
email me at [email protected] if you are interested in chatting!
@GeroldMeisinger
Hey, I'm also training SD3 controlnets and experiencing the "mode collapse" problem? Have you figured out the cause of this phenomenon? My dataset is about 5m with softedge condition images. The training batch size is 120. The training start to converge in 1k steps. But after about 12k iterations, the results start to "collapse" and after 17k iterations, the results and totally collapsed.
I've been searching for similar issues for days. It seems like this discussion is the only one most related to what I'm experiencing now?
yes, I got the same result sometime where images started to look grainy, see here https://civitai.com/articles/2078#heading-35435 -> failed training. I can't tell you why this happens, it seems to happen at random in some of my training runs. if I just started again with the same settings, it work. my assumption is that under certain circumstances you get a value overflow and at this point it cannot heal anymore. just restart it.
if you get conversion at 1k steps already, increase total batch size and reduce learning rate a bit, see here https://github.com/lllyasviel/ControlNet/discussions/318#discussioncomment-7176692 . or if you don't care about quality so much and it works already, just take an earlier checkpoint (e.g. 10k).
@GeroldMeisinger Hey, I'm also training SD3 controlnets and experiencing the "mode collapse" problem? Have you figured out the cause of this phenomenon? My dataset is about 5m with softedge condition images. The training batch size is 120. The training start to converge in 1k steps. But after about 12k iterations, the results start to "collapse" and after 17k iterations, the results and totally collapsed.
I've been searching for similar issues for days. It seems like this discussion is the only one most related to what I'm experiencing now?
Hi, i have met the exactly the same issue like you, after ~15k iters the image is full of block. Could you contact me with email: [email protected] ? We can talk about it together. I've been stuck on this for two months
yes, I got the same result sometime where images started to look grainy, see here https://civitai.com/articles/2078#heading-35435 -> failed training. I can't tell you why this happens, it seems to happen at random in some of my training runs. if I just started again with the same settings, it work. my assumption is that under certain circumstances you get a value overflow and at this point it cannot heal anymore. just restart it.
if you get conversion at 1k steps already, increase total batch size and reduce learning rate a bit, see here https://github.com/lllyasviel/ControlNet/discussions/318#discussioncomment-7176692 . or if you don't care about quality so much and it works already, just take an earlier checkpoint (e.g. 10k).
Hey,
Here is my result (image is too blocky), similar to your results?
Your suggestion to increase the batch_size? And what do you mean 'just restart it' ?
I think the image looks "fine" in the sense that this is not the "model collapse" I noticed when training controlnets in SD1.5. what i saw was images becoming more and more grainy, overbright and the dog and cat would deform. unfortunately I never saved any images of this effect. what you are seeing are deconvolution artifacts https://www.neuralception.com/convs-deconvs-artifacts/ however I cannot tell you why this happens or if it is related to SD3, SD3 controlnets or your training. I only ever noticed them in vanilla flux-dev generations https://www.reddit.com/r/comfyui/comments/1eqepmv/3000_images_from_img2txt2img_generated_with/
I don't know, sorry