|
## Model description |
|
This is a fine-tuned model based on [apple/mobilevitv2-1.0-imagenet1k-256](https://huggingface.co/apple/mobilevitv2-1.0-imagenet1k-256) trained for sketch image recognition using [Xenova/quickdraw-small](https://huggingface.co/datasets/Xenova/quickdraw-small) dataset. |
|
|
|
## How to use? |
|
``` |
|
from transformers import MobileViTImageProcessor, MobileViTV2ForImageClassification |
|
from PIL import Image |
|
import requests |
|
import torch |
|
import numpy as np # Importing NumPy |
|
|
|
url = "https://static.thenounproject.com/png/2024184-200.png" |
|
response = requests.get(url, stream=True) |
|
|
|
# Convert to grayscale to ensure a single channel input |
|
image = Image.open(response.raw).convert('L') # Convert to grayscale |
|
|
|
processor = MobileViTImageProcessor.from_pretrained("laszlokiss27/doodle-dash2") |
|
model = MobileViTV2ForImageClassification.from_pretrained("laszlokiss27/doodle-dash2") |
|
|
|
# Convert the PIL image to a tensor and add a channel dimension |
|
image_tensor = torch.unsqueeze(torch.tensor(np.array(image)), 0).float() |
|
image_tensor = image_tensor.unsqueeze(0) # Add batch dimension |
|
|
|
# Check if processor requires specific form of input |
|
inputs = processor(images=image_tensor, return_tensors="pt") |
|
|
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
|
|
# Get prediction |
|
predicted_class_idx = logits.argmax(-1).item() |
|
predicted_class = model.config.id2label[predicted_class_idx] |
|
print("Predicted class:", predicted_class) |
|
|
|
``` |