prithivMLmods commited on
Commit
43bcb30
·
verified ·
1 Parent(s): 5a976fa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -0
README.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-classification
4
+ library_name: transformers
5
+ tags:
6
+ - deep-fake
7
+ - ViT
8
+ - detection
9
+ - Image
10
+ - transformers-4.49.0.dev0
11
+ - precision-92.12
12
+ - v2
13
+ base_model:
14
+ - google/vit-base-patch16-224-in21k
15
+ ---
16
+
17
+ ![fake q.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/PVkTbLOEBr-qNkTws3UsD.gif)
18
+
19
+ # **Deep-Fake-Detector-v2-Model**
20
+
21
+ # **Overview**
22
+
23
+ The **Deep-Fake-Detector-v2-Model** is a state-of-the-art deep learning model designed to detect deepfake images. It leverages the **Vision Transformer (ViT)** architecture, specifically the `google/vit-base-patch16-224-in21k` model, fine-tuned on a dataset of real and deepfake images. The model is trained to classify images as either "Realism" or "Deepfake" with high accuracy, making it a powerful tool for detecting manipulated media.
24
+
25
+ ```
26
+ Classification report:
27
+
28
+ precision recall f1-score support
29
+
30
+ Realism 0.9683 0.8708 0.9170 28001
31
+ Deepfake 0.8826 0.9715 0.9249 28000
32
+
33
+ accuracy 0.9212 56001
34
+ macro avg 0.9255 0.9212 0.9210 56001
35
+ weighted avg 0.9255 0.9212 0.9210 56001
36
+ ```
37
+
38
+ **Confusion Matrix**:
39
+ ```
40
+ [[True Positives, False Negatives],
41
+ [False Positives, True Negatives]]
42
+ ```
43
+
44
+ ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/VLX0QDcKkSLIJ9c5LX-wt.png)
45
+
46
+ **<span style="color:red;">Update :</span>** The previous model checkpoint was obtained using a smaller classification dataset. Although it performed well in evaluation scores, its real-time performance was average due to limited variations in the training set. The new update includes a larger dataset to improve the detection of fake images.
47
+
48
+ | Repository | Link |
49
+ |------------|------|
50
+ | Deep Fake Detector v2 Model | [GitHub Repository](https://github.com/PRITHIVSAKTHIUR/Deep-Fake-Detector-Model) |
51
+
52
+ # **Key Features**
53
+ - **Architecture**: Vision Transformer (ViT) - `google/vit-base-patch16-224-in21k`.
54
+ - **Input**: RGB images resized to 224x224 pixels.
55
+ - **Output**: Binary classification ("Realism" or "Deepfake").
56
+ - **Training Dataset**: A curated dataset of real and deepfake images.
57
+ - **Fine-Tuning**: The model is fine-tuned using Hugging Face's `Trainer` API with advanced data augmentation techniques.
58
+ - **Performance**: Achieves high accuracy and F1 score on validation and test datasets.
59
+
60
+ # **Model Architecture**
61
+ The model is based on the **Vision Transformer (ViT)**, which treats images as sequences of patches and applies a transformer encoder to learn spatial relationships. Key components include:
62
+ - **Patch Embedding**: Divides the input image into fixed-size patches (16x16 pixels).
63
+ - **Transformer Encoder**: Processes patch embeddings using multi-head self-attention mechanisms.
64
+ - **Classification Head**: A fully connected layer for binary classification.
65
+
66
+ # **Training Details**
67
+ - **Optimizer**: AdamW with a learning rate of `1e-6`.
68
+ - **Batch Size**: 32 for training, 8 for evaluation.
69
+ - **Epochs**: 2.
70
+ - **Data Augmentation**:
71
+ - Random rotation (±90 degrees).
72
+ - Random sharpness adjustment.
73
+ - Random resizing and cropping.
74
+ - **Loss Function**: Cross-Entropy Loss.
75
+ - **Evaluation Metrics**: Accuracy, F1 Score, and Confusion Matrix.
76
+
77
+ # **Inference with Hugging Face Pipeline**
78
+ ```python
79
+ from transformers import pipeline
80
+
81
+ # Load the model
82
+ pipe = pipeline('image-classification', model="prithivMLmods/Deep-Fake-Detector-v2-Model", device=0)
83
+
84
+ # Predict on an image
85
+ result = pipe("path_to_image.jpg")
86
+ print(result)
87
+ ```
88
+
89
+ # **Inference with PyTorch**
90
+ ```python
91
+ from transformers import ViTForImageClassification, ViTImageProcessor
92
+ from PIL import Image
93
+ import torch
94
+
95
+ # Load the model and processor
96
+ model = ViTForImageClassification.from_pretrained("prithivMLmods/Deep-Fake-Detector-v2-Model")
97
+ processor = ViTImageProcessor.from_pretrained("prithivMLmods/Deep-Fake-Detector-v2-Model")
98
+
99
+ # Load and preprocess the image
100
+ image = Image.open("path_to_image.jpg").convert("RGB")
101
+ inputs = processor(images=image, return_tensors="pt")
102
+
103
+ # Perform inference
104
+ with torch.no_grad():
105
+ outputs = model(**inputs)
106
+ logits = outputs.logits
107
+ predicted_class = torch.argmax(logits, dim=1).item()
108
+
109
+ # Map class index to label
110
+ label = model.config.id2label[predicted_class]
111
+ print(f"Predicted Label: {label}")
112
+ ```
113
+ # **Dataset**
114
+ The model is fine-tuned on the dataset, which contains:
115
+ - **Real Images**: Authentic images of human faces.
116
+ - **Fake Images**: Deepfake images generated using advanced AI techniques.
117
+
118
+ # **Limitations**
119
+ The model is trained on a specific dataset and may not generalize well to other deepfake datasets or domains.
120
+ - Performance may degrade on low-resolution or heavily compressed images.
121
+ - The model is designed for image classification and does not detect deepfake videos directly.
122
+
123
+ # **Ethical Considerations**
124
+
125
+ **Misuse**: This model should not be used for malicious purposes, such as creating or spreading deepfakes.
126
+ **Bias**: The model may inherit biases from the training dataset. Care should be taken to ensure fairness and inclusivity.
127
+ **Transparency**: Users should be informed when deepfake detection tools are used to analyze their content.
128
+
129
+ # **Future Work**
130
+ - Extend the model to detect deepfake videos.
131
+ - Improve generalization by training on larger and more diverse datasets.
132
+ - Incorporate explainability techniques to provide insights into model predictions.
133
+
134
+ # **Citation**
135
+
136
+ ```bibtex
137
+ @misc{Deep-Fake-Detector-v2-Model,
138
+ author = {prithivMLmods},
139
+ title = {Deep-Fake-Detector-v2-Model},
140
+ initial = {21 Mar 2024},
141
+ second_updated = {31 Jan 2025},
142
+ latest_updated = {02 Feb 2025}
143
+ }