File size: 3,375 Bytes
c77aba7
 
5f8531e
c77aba7
5f8531e
 
 
 
 
 
 
 
 
 
37fcd54
 
 
 
 
 
 
5f8531e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
license: apache-2.0
pipeline_tag: mask-generation
---

# NanoSAM: Accelerated Segment Anything Model for Edge deployment

- [GitHub](https://github.com/binh234/nanosam)
- [Demo](https://huggingface.co/spaces/dragonSwing/nanosam)

## Pretrained Models

NanoSAM performance on edge devices. Latency/throughput is measured on NVIDIA Jetson Xavier NX, and NVIDIA T4 GPU with TensorRT, fp16. Data transfer time is included.

| Image Encoder   |  CPU  | Jetson Xavier NX |  T4   | Model size |                                                 Download                                                 |
| --------------- | :---: | :--------------: | :---: | :--------: | :------------------------------------------------------------------------------------------------------: |
| PPHGV2-B1       | 110ms |      9.6ms       | 2.4ms |   12.7MB   | [Link](https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx) |
| PPHGV2-B2       | 200ms |      12.4ms      | 3.2ms |   29.5MB   | [Link](https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx) |
| PPHGV2-B4       | 300ms |      17.3ms      | 4.1ms |   61.4MB   | [Link](https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx) |
| ResNet18        | 500ms |      22.4ms      | 5.8ms |   63.2MB   |      [Link](https://drive.google.com/file/d/14-SsvoaTl-esC3JOzomHDnI9OGgdO2OR/view?usp=drive_link)       |
| EfficientViT-L0 |  1s   |      31.6ms      |  6ms  |  117.5MB   |                                                    -                                                     |

Zero-Shot Instance Segmentation on COCO2017 validation dataset

| Image Encoder   | mAP<sup>mask<br>50-95 | mIoU (all) | mIoU (large) | mIoU (medium) | mIoU (small) |
| --------------- | :-------------------: | :--------: | :----------: | :-----------: | :----------: |
| ResNet18        |           -           |    70.6    |     79.6     |     73.8      |     62.4     |
| MobileSAM       |           -           |    72.8    |     80.4     |     75.9      |     65.8     |
| PPHGV2-B1       |         41.2          |    75.6    |     81.2     |     77.4      |     70.8     |
| PPHGV2-B2       |         42.6          |    76.5    |     82.2     |     78.5      |     71.5     |
| PPHGV2-B4       |         44.0          |    77.3    |     83.0     |     79.7      |     72.1     |
| EfficientViT-L0 |         45.6          |    78.6    |     83.7     |     81.0      |     73.3     |

## Usage

```python3
from nanosam.utils.predictor import Predictor

image_encoder_cfg = {
    "path": "data/sam_hgv2_b4_ln_nonorm_image_encoder.onnx",
    "name": "OnnxModel",
    "provider": "cpu",
    "normalize_input": False,
}
mask_decoder_cfg = {
    "path": "data/efficientvit_l0_mask_decoder.onnx",
    "name": "OnnxModel",
    "provider": "cpu",
}
predictor = Predictor(encoder_cfg, decoder_cfg)

image = PIL.Image.open("assets/dogs.jpg")

predictor.set_image(image)

mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))
```

The point labels may be

| Point Label | Description               |
| :---------: | ------------------------- |
|      0      | Background point          |
|      1      | Foreground point          |
|      2      | Bounding box top-left     |
|      3      | Bounding box bottom-right |