How to extract masked image part?=

#3
by MonsterMMORPG - opened

I want to extract masked image part how can I do that?

Also is this the best text to mask method available atm?

thank you so much @nielsr

perhaps you can show modified version of this script to do this

input image

a.jpg

expected output

image.png

image = Image.open("0.png")

processor = CLIPSegProcessor.from_pretrained("./clipseg-rd64-refined")
model = CLIPSegForImageSegmentation.from_pretrained("./clipseg-rd64-refined")

prompts = ["a blue block","a orange block", "a yellow block", "a red block"]

inputs = processor(text=prompts, images=[image] * len(prompts), padding="max_length", return_tensors="pt")

with torch.no_grad():
outputs = model(**inputs)

logits = outputs.logits
print(logits.shape)

preds = outputs.logits.unsqueeze(1)

@nielsr hi

thank you so much for answer

on the colab i have tested

and it is not working

i have provided divisibile to 16 pixels images

even exactly 352x352 pixel image still not working

here error for 352x352

used image

image.png

image.png

image.png

i also tested this image divisible to 16 it didnt work either

image.png

@MonsterMMORPG I have a fork of this space that produces bounding boxes and masks. You can try it here. You might find it helpful.

@taesiri it works great

how can I extract the image based on the alpha? I want to extract only alpha part

only alpha and not the rest

e.g.

this

image.png

into this

image.png

also i tested different strength and it includes head as well when clothes prompt used

any ideas to discard head?

@MonsterMMORPG About the prompt, you can try different ones, for example "jacket and jeans" works good on your image, but it is not perfect.

As for alpha, you can use the mask output and some PIL magic to get the image you want. You can find a tutorial here: https://note.nkmk.me/en/python-pillow-composite/

@taesiri thank you so much.

would that be too hard for you to add that to your demo page? i am also outputting masked part

i am totally noob at python :/

@MonsterMMORPG Okay :-), I will try to add it later today (after my I am done with my daily tasks :D)

@MonsterMMORPG Okay :-), I will try to add it later today (after my I am done with my daily tasks :D)

Awesome thank you so much. Looking forward to it.

@MonsterMMORPG Pushed an update.

@MonsterMMORPG Pushed an update.

Awesome ty so much will test asap

@taesiri thank you so much again
since i am very newbie in Python I am having this error
I made a colab and I want to run code without gradio interface

I am running this command but can't show and save returned 3 images

here the colab link : https://colab.research.google.com/drive/1Eain9Tri7HUa90qUBn3kuEQy9D6A-w-e?usp=sharing

could you help me to show 3 returned images? and save them?

input_image = Image.open("/content/a.jpg")
input_prompt = "clothes"
thresholdVal=1.0
alpha_val=0.5
draw_rectangles = False
outputs = process_image(input_image,input_prompt,thresholdVal,alpha_val,draw_rectangles);

outputs[0].show() // only this works
outputs[1].show()
outputs[2].show()

@MonsterMMORPG

The first returned variable is a matplotlib figure, second is a numpy array; and the last one is a PIL image containing the region of the image you are interested in. We can not call .show() on a numpy array.

If you want to save the figure: outputs[0].savefig('fig.png')
If you want to save the mask : Image.fromarray(np.uint8(outputs[1] * 255), "L").save('mask.png')
If you want to save the region of interetest:  outputs[2].save('clothes.png')

if you want to show them side by side:

import io

# Converting matplotlib figure to PIL Image, which is completely unnecessary! 
buf = io.BytesIO()
outputs[0].savefig(buf, format='png', bbox_inches='tight', pad_inches=0)
image_data = buf.getvalue()
pil_image = Image.open(io.BytesIO(image_data))

fig, axes = plt.subplots(1, 3, figsize=(10, 4))

axes[0].imshow(pil_image)
axes[1].imshow(Image.fromarray(np.uint8(outputs[1] * 255), "L"), cmap='jet')
axes[2].imshow(outputs[2])

plt.show()

Hope this helps.

@taesiri working awesome thank you so much

can't we give negative mask word like do not take face as negative prompt

also can we make or?

i mean like clothes or pants or shirts or boots etc

here example

clothes

image.png

pants

image.png

none below works
clothes or pants
clothes , pants
clothes pants

image.png

@taesiri would it be possible like

merging multiple masks
like clothes, pants, shirts and perhaps zeroing out the negative such as head, face

like mask = mask_clothes + mask_pants + mask_shirts - mask_face - mask_head

@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will  have some time for this in the next week.

@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will  have some time for this in the next week.

awesome looking forward to that.

@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will  have some time for this in the next week.

Hello again. Any chance to look at this so far?

@MonsterMMORPG You can try it here: https://huggingface.co/spaces/taesiri/CLIPSeg2

Not very efficient at the moment, but it does what you need. When entering prompts, use a comma to separate multiple objects.

@MonsterMMORPG You can try it here: https://huggingface.co/spaces/taesiri/CLIPSeg2

Not very efficient at the moment, but it does what you need. When entering prompts, use a comma to separate multiple objects.

Awesome thank you so so much. You are amazing.

It works great. But we can't see confidence as before i presume?

I edited the previous code as below

input_image = Image.open(file_name)
positive_prompts = "clothes, shirts, pants"
negative_prompts = "face, head"
thresholdVal=0.5
alpha_val=0.5
draw_rectangles = False
outputs = extract_image(positive_prompts,negative_prompts,input_image,thresholdVal);
outputs[0].show()
outputs[1].show()
outputs[0].save(folder_path+"/"+names_no_ext[irCounter]+'_clothes.png')
outputs[0].save(folder_path+"/"+names_no_ext[irCounter]+'_clothes_mask.png')

here results

input

a2 (1).jpg

outputs

image.png

image.png

@MonsterMMORPG I removed unnecessary visualization as each word has its own confidence information and I assumed that is not required for your task :-)

@MonsterMMORPG I removed unnecessary visualization as each word has its own confidence information and I assumed that is not required for your task :-)

yep not necessary. your help was tremendous. do you think there will be sooner a better model released for this task? like better accuracy having one

@MonsterMMORPG This really depends on your use-cases, you might need to fine-tune some of these models. Have you tried other models as well (... like MaskCut).

@MonsterMMORPG This really depends on your use-cases, you might need to fine-tune some of these models. Have you tried other models as well (... like MaskCut).

thanks a lot for reply. i didn't know MaskCut. but CLIPSeg looks like better for our task

what would you suggest me to calculate similarity of clothings? We want to calculate similarity of clothings. so that lets say you show cloth A, and we return similar clothes to that cloth . so this is similarity of images calculation i presume

sorr about hidden comments. they were duplicate posting due to hugging face error

This comment has been hidden
This comment has been hidden
This comment has been hidden

@MonsterMMORPG What have you tried so far? I reckon simple cosine similarity of CLIP image embeddings would give decent results out of the box.

@MonsterMMORPG What have you tried so far? I reckon simple cosine similarity of CLIP image embeddings would give decent results out of the box.

this was also in my mind

how can I calculate this? Unfortunately I am very bad at python . Lets say If I want to make addition to this script which is your script so should be easy for you :)

https://gist.github.com/FurkanGozukara/09dd8a80d72546bd51ef73b2171e8338

also do you have any ideas?

@MonsterMMORPG English is the most popular programming language these days, you can try ChatGPT + Copilot and do anything you like 😁😁

@MonsterMMORPG English is the most popular programming language these days, you can try ChatGPT + Copilot and do anything you like 😁😁

you are absolutely right

i made a simple logical code block but getting error. cant solve where is the error?

also i need it to be able to process any image dimension

image.png

import torch
import torchvision.transforms as transforms
import urllib.request
from transformers import CLIPProcessor, CLIPModel, CLIPTokenizer
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model_ID = "openai/clip-vit-base-patch32"
model = CLIPModel.from_pretrained(model_ID).to(device)

preprocess = CLIPProcessor.from_pretrained(model_ID)

def load_and_preprocess_image(image_path):
image = Image.open(image_path)

image = preprocess(image).unsqueeze(0).to(device)

return image

image_a = load_and_preprocess_image('/content/a.png')
image_b = load_and_preprocess_image('/content/b.png')

with torch.no_grad():
embedding_a = model.encode_image(image_a)
embedding_b = model.encode_image(image_b)

similarity_score = torch.nn.functional.cosine_similarity(embedding_a, embedding_b)

print('Similarity score:', similarity_score.item())

Sign up or log in to comment