please help me . error:mat1 and mat2 shapes cannot be multiplied (2x2048 and 1536x1280)

#6
by langzhou - opened

Hello! I'm a beginner. I got this error when running the code according to yours. Could you please help me take a look at it? Thank you.
win11+python3.10.6
all loginfo:
Loading pipeline components...: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 7/7 [00:00<00:00, 16.84it/s]
0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:\git\python\aiimg\Scripts\test-net.py", line 59, in
image = pipe(
File "D:\git\python\aiimg.venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\Users\sd.cache\huggingface\modules\diffusers_modules\local\pipeline.py", line 412, in call
down_block_res_samples, mid_block_res_sample = self.controlnet(
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\git\python\aiimg\Scripts\controlnet_union.py", line 966, in forward
control_emb = self.control_add_embedding(control_embeds)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\diffusers\models\embeddings.py", line 1304, in forward
sample = self.linear_1(sample)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\linear.py", line 125, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2048 and 1536x1280)
control_type shape: torch.Size([2, 8])

Hi, are you running the code with the same images? I did this quite a while ago and I don't remember but usually I don't check image sizes for PoC so probably what's happening here is that there's a mismatch between the image sizes.

Yes, I'm using an image. The mask image was generated by the sd-webui-birefnet plugin of stable - diffusion - webui. I've tried testing with different pictures, including those in the demo, but none of them passed.

you mean the code in my guide right? the one here. I tested it with different computers and environments and it works on all of them right out of the box, as I stated there, you'll need the controlnet locally or use my repo for the controlnet like this:

controlnet_model = ControlNetModel_Union.from_pretrained(
    "OzzyGT/controlnet-union-promax-sdxl-1.0", torch_dtype=torch.float16, variant="fp16"
)

besides that, the code works as intended and without errors with those images. So if you're using the same exact code and it fails, then it's probably something wrong with your environment.

If you mean that you used the same images but passed though birefnet within the sd-webui, then the problem is probably that the output images are not ready to be usable with my code, this probably means that it has different channels or sizes. The source image shouldn't have an alpha channel and the mask needs to be a grayscale image (one channel only) and both need to have the same exact dimensions.

Yes, I've downloaded all the required models to my local machine. The following is all of my code,There hasn't been any other changes except that the models are called locally.:
import torch
from PIL import Image, ImageChops

from controlnet_union import ControlNetModel_Union
from diffusers import AutoencoderKL, StableDiffusionXLControlNetPipeline, TCDScheduler
from diffusers.utils import load_image

source_image = load_image(
"https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/diffusers_fill/jefferson-sees-OCQjiB4tG5c-unsplash.jpg"
)

width, height = source_image.size
min_dimension = min(width, height)

left = (width - min_dimension) / 2
top = (height - min_dimension) / 2
right = (width + min_dimension) / 2
bottom = (height + min_dimension) / 2

final_source = source_image.crop((left, top, right, bottom))
final_source = final_source.resize((1024, 1024), Image.LANCZOS).convert("RGBA")

mask = load_image(
"https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/diffusers_fill/car_mask_good.png"
).convert("L")

binary_mask = mask.point(lambda p: 255 if p > 0 else 0)
inverted_mask = ImageChops.invert(binary_mask)

alpha_image = Image.new("RGBA", final_source.size, (0, 0, 0, 0))
cnet_image = Image.composite(final_source, alpha_image, inverted_mask)

vae = AutoencoderKL.from_pretrained("d://ai/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")

controlnet_model = ControlNetModel_Union.from_pretrained(
"d://ai//controlnet-union-sdxl-1.0",
torch_dtype=torch.float16,
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"d://ai//RealVisXL_V5.0_Lightning",
torch_dtype=torch.float16,
vae=vae,
custom_pipeline="d://ai//pipeline_sdxl_fill",
controlnet=controlnet_model,
variant="fp16",
).to("cuda")
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

prompt = "high quality"
(
prompt_embeds,
negative_prompt_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds,
) = pipe.encode_prompt(prompt, "cuda", True)

image = pipe(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
pooled_prompt_embeds=pooled_prompt_embeds,
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
image=cnet_image,
)

image = image.convert("RGBA")
cnet_image.paste(image, (0, 0), binary_mask)

cnet_image.save("final_generation.png")

There was an error when running:
Loading pipeline components...: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 7/7 [00:00<00:00, 9.15it/s]
0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:\git\python\aiimg\Scripts\net_test.py", line 59, in
image = pipe(
File "D:\git\python\aiimg.venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\Users\py.cache\huggingface\modules\diffusers_modules\local\pipeline.py", line 412, in call
down_block_res_samples, mid_block_res_sample = self.controlnet(
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\git\python\aiimg\Scripts\controlnet_union.py", line 961, in forward
control_emb = self.control_add_embedding(control_embeds)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\diffusers\models\embeddings.py", line 1304, in forward
sample = self.linear_1(sample)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\git\python\aiimg.venv\lib\site-packages\torch\nn\modules\linear.py", line 125, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2048 and 1536x1280)

maybe the best place to start is to not use them locally and just use the code as it is, I mean with all online models and images, I know that works for sure because I have tested it countless times including right now as I write this.

If it's still fails, then it's something different in the environment and we can start looking there.

just in case, this is the code and it should work as is:

import torch
from PIL import Image, ImageChops

from controlnet_union import ControlNetModel_Union
from diffusers import AutoencoderKL, StableDiffusionXLControlNetPipeline, TCDScheduler
from diffusers.utils import load_image


source_image = load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/diffusers_fill/jefferson-sees-OCQjiB4tG5c-unsplash.jpg"
)

width, height = source_image.size
min_dimension = min(width, height)

left = (width - min_dimension) / 2
top = (height - min_dimension) / 2
right = (width + min_dimension) / 2
bottom = (height + min_dimension) / 2

final_source = source_image.crop((left, top, right, bottom))
final_source = final_source.resize((1024, 1024), Image.LANCZOS).convert("RGBA")

mask = load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/diffusers_fill/car_mask_good.png"
).convert("L")

binary_mask = mask.point(lambda p: 255 if p > 0 else 0)
inverted_mask = ImageChops.invert(binary_mask)

alpha_image = Image.new("RGBA", final_source.size, (0, 0, 0, 0))
cnet_image = Image.composite(final_source, alpha_image, inverted_mask)

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")

controlnet_model = ControlNetModel_Union.from_pretrained(
    "OzzyGT/controlnet-union-promax-sdxl-1.0", torch_dtype=torch.float16, variant="fp16"
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "SG161222/RealVisXL_V5.0_Lightning",
    torch_dtype=torch.float16,
    vae=vae,
    custom_pipeline="OzzyGT/pipeline_sdxl_fill",
    controlnet=controlnet_model,
    variant="fp16",
).to("cuda")
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

prompt = "high quality"
(
    prompt_embeds,
    negative_prompt_embeds,
    pooled_prompt_embeds,
    negative_pooled_prompt_embeds,
) = pipe.encode_prompt(prompt, "cuda", True)

image = pipe(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_prompt_embeds,
    pooled_prompt_embeds=pooled_prompt_embeds,
    negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
    image=cnet_image,
)

image = image.convert("RGBA")
cnet_image.paste(image, (0, 0), binary_mask)

cnet_image.save("final_generation.png")

Sign up or log in to comment