[SD2.1] - Input shapes for unet model
#40
by
lalith-mcw
- opened
Trying to run via Openvino IR - inferencing a pixelated image currently
Input Nodes for SD2.1:
sample - [2,4,64,64],timestep [-1] and encoder_hidden_states [2,77,1024]
Still I do get the inferenced image as 512x512 since vae_decoder takes latents input of shape 512x512 and that results in a pixelated image. What are the shapes used for the above three nodes for proper inferencing
Input Nodes for SD2.1:
sample - [2,4,64,64],timestep [-1] and encoder_hidden_states [2,77,768]
With these inputs the output was proper for SD1.4 models also tried using the DPMSolverMultistepScheduler
for SD2.1 still the output is the same.
Saw somewhere the encoder_hidden_states blob shape was updated ? What are the right dimensions to be used ?