ViT Base Violence Detection

Model Description

This is a Vision Transformer (ViT) model fine-tuned for violence detection. The model is based on google/vit-base-patch16-224-in21k and has been trained on the Real Life Violence Situations dataset from Kaggle to classify images into violent or non-violent categories.

Intended Use

The model is intended for use in applications where detecting violent content in images is necessary. This can include:

  • Content moderation
  • Surveillance
  • Parental control software

Model accuracy

Test accuracy for Vit Base = 98.80% Loss = 0.20038144290447235

How to Use

Here is an example of how to use this model for image classification:

import torch
from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image

# Load the model and feature extractor
model = ViTForImageClassification.from_pretrained('jaranohaal/vit-base-violence-detection')
feature_extractor = ViTFeatureExtractor.from_pretrained('jaranohaal/vit-base-violence-detection')

# Load an image
image = Image.open('image.jpg')

# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()

# Print the predicted class
print("Predicted class:", model.config.id2label[predicted_class_idx])
Downloads last month
653
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.