File size: 4,976 Bytes
c50fb9a
e001be0
2ce7670
 
 
 
 
e001be0
 
 
c50fb9a
aa06c8e
 
 
bb36ed6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa06c8e
bb36ed6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46438bd
 
 
 
 
 
 
 
 
 
 
 
bb36ed6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa06c8e
bb36ed6
 
 
 
 
 
 
e001be0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
license: apache-2.0
pipeline_tag: image-classification
library_name: transformers
tags:
- deep-fake
- detectioon
- ViT
base_model:
- google/vit-base-patch16-224-in21k
---

![df[ViT].gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Xbuv-x40-l3QjzWu5Yj2F.gif)

# **Deep-Fake-Detector-Model**

# **Overview**
The **Deep-Fake-Detector-Model** is a state-of-the-art deep learning model designed to detect deepfake images. It leverages the **Vision Transformer (ViT)** architecture, specifically the `google/vit-base-patch16-224-in21k` model, fine-tuned on a dataset of real and deepfake images. The model is trained to classify images as either "Real" or "Fake" with high accuracy, making it a powerful tool for detecting manipulated media.

### **Key Features**
- **Architecture**: Vision Transformer (ViT) - `google/vit-base-patch16-224-in21k`.
- **Input**: RGB images resized to 224x224 pixels.
- **Output**: Binary classification ("Real" or "Fake").
- **Training Dataset**: A curated dataset of real and deepfake images (e.g., `Hemg/deepfake-and-real-images`).
- **Fine-Tuning**: The model is fine-tuned using Hugging Face's `Trainer` API with advanced data augmentation techniques.
- **Performance**: Achieves high accuracy and F1 score on validation and test datasets.

# **Model Architecture**
The model is based on the **Vision Transformer (ViT)**, which treats images as sequences of patches and applies a transformer encoder to learn spatial relationships. Key components include:
- **Patch Embedding**: Divides the input image into fixed-size patches (16x16 pixels).
- **Transformer Encoder**: Processes patch embeddings using multi-head self-attention mechanisms.
- **Classification Head**: A fully connected layer for binary classification.

# **Training Details**
- **Optimizer**: AdamW with a learning rate of `1e-6`.
- **Batch Size**: 32 for training, 8 for evaluation.
- **Epochs**: 2.
- **Data Augmentation**:
  - Random rotation (±90 degrees).
  - Random sharpness adjustment.
  - Random resizing and cropping.
- **Loss Function**: Cross-Entropy Loss.
- **Evaluation Metrics**: Accuracy, F1 Score, and Confusion Matrix.

# **Inference with Hugging Face Pipeline**
```python
from transformers import pipeline

# Load the model
pipe = pipeline('image-classification', model="Deep-Fake-Detector-Model", device=0)

# Predict on an image
result = pipe("path_to_image.jpg")
print(result)
```

# **Inference with PyTorch**
```python
from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image
import torch

# Load the model and processor
model = ViTForImageClassification.from_pretrained("Deep-Fake-Detector-Model")
processor = ViTImageProcessor.from_pretrained("Deep-Fake-Detector-Model")

# Load and preprocess the image
image = Image.open("path_to_image.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()

# Map class index to label
label = model.config.id2label[predicted_class]
print(f"Predicted Label: {label}")
```
# **Performance Metrics**
```
Classification report:

              precision    recall  f1-score   support

        Real     0.6276    0.9823    0.7659     38054
        Fake     0.9594    0.4176    0.5819     38080

    accuracy                         0.6999     76134
   macro avg     0.7935    0.7000    0.6739     76134
weighted avg     0.7936    0.6999    0.6739     76134
```
- **Confusion Matrix**:
  ```
  [[True Positives, False Negatives],
   [False Positives, True Negatives]]
  ```

# **Dataset**
The model is fine-tuned on the `Hemg/deepfake-and-real-images` dataset, which contains:
- **Real Images**: Authentic images of human faces.
- **Fake Images**: Deepfake images generated using advanced AI techniques.

# **Limitations**
The model is trained on a specific dataset and may not generalize well to other deepfake datasets or domains.
- Performance may degrade on low-resolution or heavily compressed images.
- The model is designed for image classification and does not detect deepfake videos directly.

# **Ethical Considerations**

**Misuse**: This model should not be used for malicious purposes, such as creating or spreading deepfakes.
**Bias**: The model may inherit biases from the training dataset. Care should be taken to ensure fairness and inclusivity.
**Transparency**: Users should be informed when deepfake detection tools are used to analyze their content.

# **Future Work**
- Extend the model to detect deepfake videos.
- Improve generalization by training on larger and more diverse datasets.
- Incorporate explainability techniques to provide insights into model predictions.

# **Citation**

```bibtex
@misc{Deep-Fake-Detector-Model,
  author = {prithivMLmods},
  title = {Deep-Fake-Detector-Model},
  initial = {2024},
  last_updated = {31 Jan 2025}
}