ViT Base Violence Detection
Model Description
This is a Vision Transformer (ViT) model fine-tuned for violence detection. The model is based on google/vit-base-patch16-224-in21k and has been trained on the Real Life Violence Situations dataset from Kaggle to classify images into violent or non-violent categories.
Intended Use
The model is intended for use in applications where detecting violent content in images is necessary. This can include:
- Content moderation
- Surveillance
- Parental control software
Model accuracy
Test accuracy for Vit Base = 98.80% Loss = 0.20038144290447235
How to Use
Here is an example of how to use this model for image classification:
import torch
from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image
# Load the model and feature extractor
model = ViTForImageClassification.from_pretrained('jaranohaal/vit-base-violence-detection')
feature_extractor = ViTFeatureExtractor.from_pretrained('jaranohaal/vit-base-violence-detection')
# Load an image
image = Image.open('image.jpg')
# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
# Print the predicted class
print("Predicted class:", model.config.id2label[predicted_class_idx])
- Downloads last month
- 2,118
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.