aspect ratios, img2img
Hello,
How does one specify non-square aspect ratios when invoking pipe?
Also, is img2img possible with this model, and if so, how?
thanks!
input image size for img2img is used for output.
Thanks for your reply. Is there somewhere that describes how to do img2img with this model? I don't see information about it on the model card page or on github.
I tried passing in an image like I have with earlier models:
images = pipe(prompt=prompt, image=init_img).images[0]
but this doesn't work
I'm also looking for instruction on how to get an img2img model up and running. Can someone point me in the right direction please? Thanks!
This was the missing piece for me: StableDiffusionXLImg2ImgPipeline
Working now
I got it working with stable diffusion 1.5 from the example here: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img, but it still doesn't work with "stabilityai/stable-diffusion-xl-base-0.9". Is there support for the newer models, or are we left with 1.5 until an update?
where it says
from diffusers import DiffusionPipeline
change that to
from diffusers import DiffusionPipeline, StableDiffusionXLImg2ImgPipeline
and then use
StableDiffusionXLImg2ImgPipeline.from_pretrained()
to initialise your pipeline for the base model as img2img.
Awesome! Totally works! Thanks so much! Fun toy to play with :)
For anyone else, here's the script I ran that works:
import requests
import torch
from PIL import Image
from io import BytesIO
from diffusers import DiffusionPipeline, StableDiffusionXLImg2ImgPipeline
image_path = Image.open("./testImage.png")
prompt = "In the style of the Mona Lisa"
device = "cuda"
model_id_or_path = "stabilityai/stable-diffusion-xl-base-0.9"
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe = pipe.to(device)
init_image = image_path
images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
images[0].save("testOutput.png")
Although... if anyone wants to school me how to best use this too, that'd be awesome, haha.
When I tried running it, it takes in an image, and sort of does a transformation of it. For example, I gave it a photo of myself with a plate of food in front of me an prompted it to "replace the food with car parts". It transformed the image, but nothing really changed, but everything was kind of messed up. Different person, food on my shirt for some reason, all kind of blurry, no car parts.
I was hoping to be able to do things like... replace backgrounds, take my face and put it on other characters, replace items in photos. Is that not what this pipeline is for? I've seen examples of taking a sketch and transforming it into a "good" drawing. Is the img2img model just sort of... minimizing the given input photo to whatever it knows as a good solution, without regard to the original aesthetics of the image? Could someone who knows better explain what this particular pipeline is best at doing and how it may be doing it?
Thanks for all the help!
you need to use something from .5 to .8 strength in the img2img pipeline with this, and a prompt that describes the scene:
a stunning portrait of a man sitting in front of a plate full of delicious car parts
Oooh, interesting. I tried outputting different strengths, and that definitely seems to matter. The additional things that I ask for in the image seem to work much better.
Whenever I use an input image of a person (me) it changes it into a totally different person though. I don't suppose there's a way to have it maintain certain aspects of the image, like my face/body/etc?
Thanks again for all your help!
the inpainting currently has some issues, or that would be useful to solve the problem.
until then, use Gimp, an image editor, and paste your original image onto there. then, paste the generated img2img result. it is then possible to use the eraser tool and remove the incorrect face.
you can go further with that and replace most of the image except for the plate, with the original..
and there you have it. poor-man's inpainting.