Update code snippet
#11
by
qubvel-hf
HF staff
- opened
README.md
CHANGED
@@ -48,24 +48,37 @@ The SAM model is made up of 3 modules:
|
|
48 |
## Prompted-Mask-Generation
|
49 |
|
50 |
```python
|
51 |
-
|
52 |
import requests
|
|
|
53 |
from transformers import SamModel, SamProcessor
|
54 |
|
55 |
-
|
56 |
-
|
|
|
|
|
|
|
57 |
|
|
|
58 |
img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
|
59 |
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
|
60 |
input_points = [[[450, 600]]] # 2D localization of a window
|
61 |
-
```
|
62 |
|
|
|
|
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
|
|
|
|
|
|
|
|
|
|
68 |
scores = outputs.iou_scores
|
|
|
|
|
69 |
```
|
70 |
Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to [the official repository](https://github.com/facebookresearch/segment-anything/issues/4#issuecomment-1497626844).
|
71 |
For more details, refer to this notebook, which shows a walk throught of how to use the model, with a visual example!
|
|
|
48 |
## Prompted-Mask-Generation
|
49 |
|
50 |
```python
|
51 |
+
import torch
|
52 |
import requests
|
53 |
+
from PIL import Image
|
54 |
from transformers import SamModel, SamProcessor
|
55 |
|
56 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "CPU")
|
57 |
+
|
58 |
+
# load model and processor
|
59 |
+
model = SamModel.from_pretrained("facebook/sam-vit-huge").to(device)
|
60 |
+
processor = SamProcessor.from_pretrained("facebook/sam-vit-base")
|
61 |
|
62 |
+
# prepare model imputs
|
63 |
img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
|
64 |
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
|
65 |
input_points = [[[450, 600]]] # 2D localization of a window
|
|
|
66 |
|
67 |
+
inputs = processor(raw_image, input_points=input_points, return_tensors="pt")
|
68 |
+
inputs = inputs.to(device)
|
69 |
|
70 |
+
with torch.no_grad():
|
71 |
+
outputs = model(**inputs)
|
72 |
+
|
73 |
+
# post process model results
|
74 |
+
masks = processor.image_processor.post_process_masks(
|
75 |
+
outputs.pred_masks.cpu(),
|
76 |
+
inputs["original_sizes"].cpu(),
|
77 |
+
inputs["reshaped_input_sizes"].cpu()
|
78 |
+
)
|
79 |
scores = outputs.iou_scores
|
80 |
+
print(scores)
|
81 |
+
# tensor([[[0.9057, 0.9563, 0.9669]]], device='cuda:0')
|
82 |
```
|
83 |
Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to [the official repository](https://github.com/facebookresearch/segment-anything/issues/4#issuecomment-1497626844).
|
84 |
For more details, refer to this notebook, which shows a walk throught of how to use the model, with a visual example!
|