nielsr/CLIPSeg · How to extract masked image part?=

MonsterMMORPG

Feb 13, 2023

•

edited Feb 13, 2023

I want to extract masked image part how can I do that?

Also is this the best text to mask method available atm?

thank you so much @nielsr

perhaps you can show modified version of this script to do this

input image

expected output

image = Image.open("0.png")

processor = CLIPSegProcessor.from_pretrained("./clipseg-rd64-refined")
model = CLIPSegForImageSegmentation.from_pretrained("./clipseg-rd64-refined")

prompts = ["a blue block","a orange block", "a yellow block", "a red block"]

inputs = processor(text=prompts, images=[image] * len(prompts), padding="max_length", return_tensors="pt")

with torch.no_grad():
outputs = model(**inputs)

logits = outputs.logits
print(logits.shape)

preds = outputs.logits.unsqueeze(1)

nielsr

Owner Feb 13, 2023

Hi,

See my demo notebook: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/CLIPSeg

MonsterMMORPG

Feb 13, 2023

•

edited Feb 13, 2023

@nielsr hi

thank you so much for answer

on the colab i have tested

and it is not working

i have provided divisibile to 16 pixels images

even exactly 352x352 pixel image still not working

here error for 352x352

used image

i also tested this image divisible to 16 it didnt work either

MonsterMMORPG

Feb 13, 2023

@nielsr

after i loaded png it passed that part

i loaded this url : https://cdn-uploads.huggingface.co/production/uploads/1676308992250-6345bd89fe134dfd7a0dba40.png

now i got this error

taesiri

Feb 13, 2023

@MonsterMMORPG I have a fork of this space that produces bounding boxes and masks. You can try it here. You might find it helpful.

MonsterMMORPG

Feb 13, 2023

•

edited Feb 13, 2023

@taesiri it works great

how can I extract the image based on the alpha? I want to extract only alpha part

only alpha and not the rest

e.g.

this

into this

also i tested different strength and it includes head as well when clothes prompt used

any ideas to discard head?

taesiri

Feb 13, 2023

@MonsterMMORPG About the prompt, you can try different ones, for example "jacket and jeans" works good on your image, but it is not perfect.

As for alpha, you can use the mask output and some PIL magic to get the image you want. You can find a tutorial here: https://note.nkmk.me/en/python-pillow-composite/

MonsterMMORPG

Feb 13, 2023

@taesiri thank you so much.

would that be too hard for you to add that to your demo page? i am also outputting masked part

i am totally noob at python :/

taesiri

Feb 13, 2023

@MonsterMMORPG Okay :-), I will try to add it later today (after my I am done with my daily tasks :D)

MonsterMMORPG

Feb 13, 2023

@MonsterMMORPG Okay :-), I will try to add it later today (after my I am done with my daily tasks :D)

Awesome thank you so much. Looking forward to it.

taesiri

Feb 14, 2023

@MonsterMMORPG Pushed an update.

MonsterMMORPG

Feb 14, 2023

@MonsterMMORPG Pushed an update.

Awesome ty so much will test asap

MonsterMMORPG

Feb 18, 2023

•

edited Feb 18, 2023

@taesiri thank you so much again
since i am very newbie in Python I am having this error
I made a colab and I want to run code without gradio interface

I am running this command but can't show and save returned 3 images

here the colab link : https://colab.research.google.com/drive/1Eain9Tri7HUa90qUBn3kuEQy9D6A-w-e?usp=sharing

could you help me to show 3 returned images? and save them?

input_image = Image.open("/content/a.jpg")
input_prompt = "clothes"
thresholdVal=1.0
alpha_val=0.5
draw_rectangles = False
outputs = process_image(input_image,input_prompt,thresholdVal,alpha_val,draw_rectangles);

outputs[0].show() // only this works
outputs[1].show()
outputs[2].show()

taesiri

Feb 18, 2023

@MonsterMMORPG

The first returned variable is a matplotlib figure, second is a numpy array; and the last one is a PIL image containing the region of the image you are interested in. We can not call .show() on a numpy array.

If you want to save the figure: outputs[0].savefig('fig.png')
If you want to save the mask : Image.fromarray(np.uint8(outputs[1] * 255), "L").save('mask.png')
If you want to save the region of interetest: outputs[2].save('clothes.png')

if you want to show them side by side:

import io

# Converting matplotlib figure to PIL Image, which is completely unnecessary! 
buf = io.BytesIO()
outputs[0].savefig(buf, format='png', bbox_inches='tight', pad_inches=0)
image_data = buf.getvalue()
pil_image = Image.open(io.BytesIO(image_data))

fig, axes = plt.subplots(1, 3, figsize=(10, 4))

axes[0].imshow(pil_image)
axes[1].imshow(Image.fromarray(np.uint8(outputs[1] * 255), "L"), cmap='jet')
axes[2].imshow(outputs[2])

plt.show()

Hope this helps.

MonsterMMORPG

Feb 19, 2023

•

edited Feb 19, 2023

@taesiri working awesome thank you so much

can't we give negative mask word like do not take face as negative prompt

also can we make or?

i mean like clothes or pants or shirts or boots etc

here example

clothes

pants

none below works
clothes or pants
clothes , pants
clothes pants

MonsterMMORPG

Feb 21, 2023

@taesiri would it be possible like

merging multiple masks
like clothes, pants, shirts and perhaps zeroing out the negative such as head, face

like mask = mask_clothes + mask_pants + mask_shirts - mask_face - mask_head

taesiri

Feb 21, 2023

@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will have some time for this in the next week.

MonsterMMORPG

Feb 21, 2023

@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will have some time for this in the next week.

awesome looking forward to that.

MonsterMMORPG

Feb 25, 2023

@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will have some time for this in the next week.

Hello again. Any chance to look at this so far?

taesiri

Feb 26, 2023

@MonsterMMORPG You can try it here: https://huggingface.co/spaces/taesiri/CLIPSeg2

Not very efficient at the moment, but it does what you need. When entering prompts, use a comma to separate multiple objects.

MonsterMMORPG

Feb 27, 2023

•

edited Feb 27, 2023

@MonsterMMORPG You can try it here: https://huggingface.co/spaces/taesiri/CLIPSeg2

Not very efficient at the moment, but it does what you need. When entering prompts, use a comma to separate multiple objects.

Awesome thank you so so much. You are amazing.

It works great. But we can't see confidence as before i presume?

I edited the previous code as below

input_image = Image.open(file_name)
positive_prompts = "clothes, shirts, pants"
negative_prompts = "face, head"
thresholdVal=0.5
alpha_val=0.5
draw_rectangles = False
outputs = extract_image(positive_prompts,negative_prompts,input_image,thresholdVal);
outputs[0].show()
outputs[1].show()
outputs[0].save(folder_path+"/"+names_no_ext[irCounter]+'_clothes.png')
outputs[0].save(folder_path+"/"+names_no_ext[irCounter]+'_clothes_mask.png')

here results

input

outputs

taesiri

Mar 1, 2023

@MonsterMMORPG I removed unnecessary visualization as each word has its own confidence information and I assumed that is not required for your task :-)

MonsterMMORPG

Mar 2, 2023

@MonsterMMORPG I removed unnecessary visualization as each word has its own confidence information and I assumed that is not required for your task :-)

yep not necessary. your help was tremendous. do you think there will be sooner a better model released for this task? like better accuracy having one

taesiri

Mar 2, 2023

@MonsterMMORPG This really depends on your use-cases, you might need to fine-tune some of these models. Have you tried other models as well (... like MaskCut).

MonsterMMORPG

Mar 5, 2023

•

edited Mar 5, 2023

@MonsterMMORPG This really depends on your use-cases, you might need to fine-tune some of these models. Have you tried other models as well (... like MaskCut).

thanks a lot for reply. i didn't know MaskCut. but CLIPSeg looks like better for our task

what would you suggest me to calculate similarity of clothings? We want to calculate similarity of clothings. so that lets say you show cloth A, and we return similar clothes to that cloth . so this is similarity of images calculation i presume

sorr about hidden comments. they were duplicate posting due to hugging face error

MonsterMMORPG

Mar 5, 2023

This comment has been hidden

MonsterMMORPG

Mar 5, 2023

This comment has been hidden

MonsterMMORPG

Mar 5, 2023

This comment has been hidden

taesiri

Mar 9, 2023

@MonsterMMORPG What have you tried so far? I reckon simple cosine similarity of CLIP image embeddings would give decent results out of the box.

MonsterMMORPG

Mar 10, 2023

@MonsterMMORPG What have you tried so far? I reckon simple cosine similarity of CLIP image embeddings would give decent results out of the box.

this was also in my mind

how can I calculate this? Unfortunately I am very bad at python . Lets say If I want to make addition to this script which is your script so should be easy for you :)

https://gist.github.com/FurkanGozukara/09dd8a80d72546bd51ef73b2171e8338

also do you have any ideas?

taesiri

Mar 10, 2023

@MonsterMMORPG English is the most popular programming language these days, you can try ChatGPT + Copilot and do anything you like 😁😁

MonsterMMORPG

Mar 11, 2023

•

edited Mar 11, 2023

@MonsterMMORPG English is the most popular programming language these days, you can try ChatGPT + Copilot and do anything you like 😁😁

you are absolutely right

i made a simple logical code block but getting error. cant solve where is the error?

also i need it to be able to process any image dimension

import torch
import torchvision.transforms as transforms
import urllib.request
from transformers import CLIPProcessor, CLIPModel, CLIPTokenizer
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model_ID = "openai/clip-vit-base-patch32"
model = CLIPModel.from_pretrained(model_ID).to(device)

preprocess = CLIPProcessor.from_pretrained(model_ID)

def load_and_preprocess_image(image_path):
image = Image.open(image_path)

image = preprocess(image).unsqueeze(0).to(device)

return image

image_a = load_and_preprocess_image('/content/a.png')
image_b = load_and_preprocess_image('/content/b.png')

with torch.no_grad():
embedding_a = model.encode_image(image_a)
embedding_b = model.encode_image(image_b)

similarity_score = torch.nn.functional.cosine_similarity(embedding_a, embedding_b)

print('Similarity score:', similarity_score.item())