pabberpe commited on
Commit
54d4d55
·
1 Parent(s): c623f55

Update Model

Browse files
Files changed (4) hide show
  1. README.md +85 -49
  2. SuSy.pt +2 -2
  3. config.json +3 -0
  4. susy_logo.jpeg +0 -0
README.md CHANGED
@@ -15,6 +15,11 @@ pipeline_tag: image-classification
15
 
16
  # SuSy - Synthetic Image Detector
17
 
 
 
 
 
 
18
  ## Model Details
19
 
20
  <!-- Provide a longer summary of what this model is. -->
@@ -33,7 +38,7 @@ The model can be used as a detector by either taking the class with the highest
33
 
34
  ### Model Description
35
 
36
- - **Developed by:** Pablo Bernabeu, Enrique Lopez and Dario Garcia-Gasulla from [HPAI](https://hpai.bsc.es/)
37
  - **Model type:** Spatial-Based Synthetic Image Detection and Recognition Convolutional Neural Network
38
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
39
 
@@ -58,8 +63,12 @@ Out-of-scope uses include the following:
58
  * Detection of manually edited images using traditional tools.
59
  * Detection of images automatically downscaled and/or upscaled. These are considered as non-synthetic samples in the model training phase.
60
  * Detection of inpainted images.
61
- * Detection of synthetic vs manually crafted illustrations. The model is trained only on photorealistic samples.
62
- * Attribution of synthetic images to their generative model if the model was not included in the training data. Although some generalization capabilities are expected, reliability in this case cannot be estimated.
 
 
 
 
63
 
64
  ## Bias, Risks, and Limitations
65
 
@@ -71,7 +80,7 @@ The model may be biased in the following ways:
71
 
72
  The model has the following technical limitations:
73
 
74
- * The performance of the model may be influenced by transformations and editions performed on the images. While the model was trained on some alterations (JPEG compression, downscaling, and downscaling+upscaling) there are other alterations applicable to images that could reduce the model accuracy.
75
  * The model will not be able to attribute synthetic images to their generative model if the model was not included in the training data.
76
  * The model is trained on patches with high gray-level contrast. For images composed entirely by low contrast regions, the model may not work as expected.
77
 
@@ -113,20 +122,22 @@ See `test_image.py` and `test_patch.py` for other examples on how to use the mod
113
 
114
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
115
 
116
- | Dataset | Year | Test | Train |
117
- |:-----------------:|:----:|------:|------:|
118
- | COCO | 2017 | 1,234 | 4,201 |
119
- | dalle-3-images | 2023 | 330 | 1,317 |
120
- | diffusiondb | 2022 | 1,234 | 4,201 |
121
- | midjourney-images | 2023 | 246 | 980 |
122
- | midjourney-tti | 2022 | 906 | 3,624 |
123
- | realisticSDXL | 2023 | 1,234 | 4,201 |
 
 
124
 
125
  #### Authentic Images
126
 
127
  - [COCO](https://cocodataset.org/)
128
 
129
- We use a random subset of the COCO dataset, containing 5,435 images, for the authentic images in our training dataset. The partitions are made respecting the original COCO splits, with 4,201 images in the training partition and 1,234 in the test partition.
130
 
131
  #### Synthetic Images
132
 
@@ -136,12 +147,14 @@ We use a random subset of the COCO dataset, containing 5,435 images, for the aut
136
  - [midjourney-texttoimage](https://www.kaggle.com/datasets/succinctlyai/midjourney-texttoimage)
137
  - [realistic-SDXL](https://huggingface.co/datasets/DucHaiten/DucHaiten-realistic-SDXL)
138
 
139
- For the diffusiondb dataset, we use a random subset of 5,435 images, with 4,201 in the training partition and 1,234 in the test partition. We use only the realistic images from the realisticSDXL dataset, with images in the realistic-2.2 split in our training data and the realistic-1 split for our test partition. The remaining datasets are used in their entirety, with 80% of the images in the training partition and 20% in the test partition.
140
 
141
  ### Training Procedure
142
 
143
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
144
 
 
 
145
  #### Preprocessing
146
 
147
  **Patch Extraction**
@@ -150,12 +163,14 @@ To prepare the training data, we extract 240x240 patches from the images, minimi
150
 
151
  **Data Augmentation**
152
 
153
- | Technique | Probability | Other Parameters |
154
- |--------------------------|:-----------:|---------------------------------------------------------------------------------------------------------|
155
- | HorizontalFlip | 0.35 | - |
156
- | RandomBrightnessContrast | 0.50 | brightness\_limit=0.2 contrast\_limit=0.2 |
157
- | RandomGamma | 0.50 | gamma\_limit=(80, 120) |
158
- | CoarseDropout | 0.50 | min\_holes=1, max\_holes=3 min\_height=64, max\_height=100, min\_width=64, max\_width=100 fill\_value=0 |
 
 
159
 
160
 
161
  #### Training Hyperparameters
@@ -168,13 +183,15 @@ To prepare the training data, we extract 240x240 patches from the images, minimi
168
  - Factor: 0.1
169
  - Patience: 4
170
  - Batch Size: 128
171
- - Epochs: 50
172
- - Early Stopping: 8
173
 
174
  ## Evaluation
175
 
176
  <!-- This section describes the evaluation protocols and provides the results. -->
177
 
 
 
178
  ### Testing Data, Factors & Metrics
179
 
180
  #### Testing Data
@@ -182,54 +199,64 @@ To prepare the training data, we extract 240x240 patches from the images, minimi
182
  <!-- This should link to a Dataset Card if possible. -->
183
 
184
  - Test Split of our Training Dataset
 
185
  - Synthetic Images in the Wild: Dataset containing 210 Authentic and Synthetic Images obtained from Social Media Platforms
186
  - [Flickr 30k Dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset)
 
 
187
 
188
  #### Metrics
189
 
190
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
191
 
192
- - Accuracy: The proportion of correctly classified images.
193
- - F1 Score: The harmonic mean of precision and recall.
194
 
195
  ### Results
196
 
197
  <!-- This section provides the results of the evaluation. -->
198
 
199
- #### Test Split
200
-
201
- Task | Detection F1 Score | Recognition F1 Score
202
- --- | --- | ---
203
- Original Images | 0.9867 | 0.9041
204
- JPEG Compressed Images | 0.9918 | 0.9141
205
- Downscaled Images | 0.9761 | 0.7470
206
- Downscaled+Upscaled Images | 0.9868 | 0.8266
207
-
208
- #### Synthetic Images in the Wild
209
-
210
- 79.55% Detection Accuracy
211
-
212
- #### Flickr30k Dataset
213
-
214
- 99.19% Detection Accuracy
 
 
 
 
 
 
 
 
215
 
216
  ### Summary
217
 
218
- The model obtains performs well in the test split, with high detection and recognition F1 scores. The model shows robustness to the JPEG compressed images for both tasks while the performance in the downscaled and rescaled images suffers in the recognition task, but the detection task remains stable.
219
 
220
- The model is also evaluated in our Synthetic Images in the Wild dataset, which contains 220 images obtained from social media platforms, with 121 real images and 99 AI-generated images. The difficuly of this dataset lies in the fact that the images are uploaded to social media by a wide range of users, so the images may have different resolutions, lighting conditions and quality, additionally they may have been edited or compressed. Regarding the synthetic images, the generation process is unknown, so the model has to generalize to unseen generative models. The dataset was tested by 10 human evaluators, which achieved an average detection accuracy of 72.22% and a best detection accuracy of 78.73%. The model achieves a detection accuracy of 79.55% in this dataset at its best threshold.
221
 
222
- Finally, the model shows excellent performance in the Flickr30k dataset. This dataset contains authentic images, so it serves the purpose of testing the number of false positives generated by the model. The model achieves a detection accuracy of 99.19% in this dataset at its best threshold.
223
 
224
  ## Environmental Impact
225
 
226
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
227
 
228
- - **Hardware Type:** 2xH100
229
- - **Hours used:** 15
230
  - **Hardware Provider:** Barcelona Supercomputing Center (BSC)
231
  - **Compute Region:** Spain
232
- - **Carbon Emitted:** 2.11kg
233
 
234
  ## Citation
235
 
@@ -238,9 +265,18 @@ Finally, the model shows excellent performance in the Flickr30k dataset. This da
238
  **BibTeX:**
239
 
240
  ```bibtex
241
- @thesis{bernabeu2024stair,
 
 
 
 
 
 
 
 
 
242
  title={Detecting and Attributing AI-Generated Images with Machine Learning},
243
- author={Bernabeu Pérez, Pablo},
244
  school={UPC, Facultat d'Informàtica de Barcelona, Departament de Ciències de la Computació},
245
  year={2024},
246
  month={06}
@@ -249,7 +285,7 @@ Finally, the model shows excellent performance in the Flickr30k dataset. This da
249
 
250
  ## Model Card Authors
251
 
252
- [Pablo Bernabeu](https://huggingface.co/pabberpe) and [Dario Garcia-Gasulla](https://huggingface.co/dariog)
253
 
254
  ## Model Card Contact
255
 
 
15
 
16
  # SuSy - Synthetic Image Detector
17
 
18
+ ![susy-logo](susy_logo.jpeg)
19
+
20
+ - **Repository:** https://github.com/HPAI-BSC/SuSy
21
+ - **Dataset:** https://huggingface.co/datasets/HPAI-BSC/SuSy-Dataset
22
+
23
  ## Model Details
24
 
25
  <!-- Provide a longer summary of what this model is. -->
 
38
 
39
  ### Model Description
40
 
41
+ - **Developed by:** [Pablo Bernabeu Perez](https://huggingface.co/pabberpe), [Enrique Lopez Cuena](https://huggingface.co/Cuena) and [Dario Garcia Gasulla](https://huggingface.co/dariog) from [HPAI](https://hpai.bsc.es/)
42
  - **Model type:** Spatial-Based Synthetic Image Detection and Recognition Convolutional Neural Network
43
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
44
 
 
63
  * Detection of manually edited images using traditional tools.
64
  * Detection of images automatically downscaled and/or upscaled. These are considered as non-synthetic samples in the model training phase.
65
  * Detection of inpainted images.
66
+ * Detection of synthetic vs manually crafted illustrations. The model is trained mainly on photorealistic samples.
67
+ * Attribution of synthetic images to their generative model if the model was not included in the training data. AThis model may not be used to train generative models or tools aimed at lthough some generalization capabilities are expected, reliability in this case cannot be estimated.
68
+
69
+ ### Forbidden Uses
70
+
71
+ This model may not be used to train generative models or tools aimed at purposefully deceiving the model or creating misleading content.
72
 
73
  ## Bias, Risks, and Limitations
74
 
 
80
 
81
  The model has the following technical limitations:
82
 
83
+ * The performance of the model may be influenced by transformations and editions performed on the images. While the model was trained on some alterations (blur, brightness, compression and gamma) there are other alterations applicable to images that could reduce the model accuracy.
84
  * The model will not be able to attribute synthetic images to their generative model if the model was not included in the training data.
85
  * The model is trained on patches with high gray-level contrast. For images composed entirely by low contrast regions, the model may not work as expected.
86
 
 
122
 
123
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
124
 
125
+ The dataset is available at: https://huggingface.co/datasets/HPAI-BSC/SuSy-Dataset
126
+
127
+ | Dataset | Year | Train | Validation | Test |
128
+ |-------------------|------|-------|------------|-------|
129
+ | COCO | 2017 | 2,967 | 1,234 | 1,234 |
130
+ | dalle-3-images | 2023 | 987 | 330 | 330 |
131
+ | diffusiondb | 2022 | 2,967 | 1,234 | 1,234 |
132
+ | realisticSDXL | 2023 | 2,967 | 1,234 | 1,234 |
133
+ | midjourney-tti | 2022 | 2,718 | 906 | 906 |
134
+ | midjourney-images | 2023 | 1,845 | 617 | 617 |
135
 
136
  #### Authentic Images
137
 
138
  - [COCO](https://cocodataset.org/)
139
 
140
+ We use a random subset of the COCO dataset, containing 5,435 images, for the authentic images in our training dataset. The partitions are made respecting the original COCO splits, with 2,967 images in the training partition and 1,234 in the validation and test partitions.
141
 
142
  #### Synthetic Images
143
 
 
147
  - [midjourney-texttoimage](https://www.kaggle.com/datasets/succinctlyai/midjourney-texttoimage)
148
  - [realistic-SDXL](https://huggingface.co/datasets/DucHaiten/DucHaiten-realistic-SDXL)
149
 
150
+ For the diffusiondb dataset, we use a random subset of 5,435 images, with 2,967 in the training partition and 1,234 in the validation and test partitions. We use only the realistic images from the realisticSDXL dataset, with images in the realistic-2.2 split in our training data and the realistic-1 split for our test partition. The remaining datasets are used in their entirety, with 60% of the images in the training partition, 20% in the validation partition and 20% in the test partition.
151
 
152
  ### Training Procedure
153
 
154
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
155
 
156
+ The training code is available at: https://github.com/HPAI-BSC/SuSy
157
+
158
  #### Preprocessing
159
 
160
  **Patch Extraction**
 
163
 
164
  **Data Augmentation**
165
 
166
+ | Technique | Probability | Other Parameters |
167
+ |--------------------------|:-----------:|-------------------------------------------|
168
+ | HorizontalFlip | 0.50 | - |
169
+ | RandomBrightnessContrast | 0.20 | brightness\_limit=0.2 contrast\_limit=0.2 |
170
+ | RandomGamma | 0.20 | gamma\_limit=(80, 120) |
171
+ | AdvancedBlur | 0.20 | |
172
+ | GaussianBlur | 0.20 | |
173
+ | JPEGCompression | 0.20 | quality\_lower=75 quality\_upper=100 |
174
 
175
 
176
  #### Training Hyperparameters
 
183
  - Factor: 0.1
184
  - Patience: 4
185
  - Batch Size: 128
186
+ - Epochs: 10
187
+ - Early Stopping: 2
188
 
189
  ## Evaluation
190
 
191
  <!-- This section describes the evaluation protocols and provides the results. -->
192
 
193
+ The evaluation code is available at: https://github.com/HPAI-BSC/SuSy
194
+
195
  ### Testing Data, Factors & Metrics
196
 
197
  #### Testing Data
 
199
  <!-- This should link to a Dataset Card if possible. -->
200
 
201
  - Test Split of our Training Dataset
202
+ - Synthetic Images generated with [Stable Diffusion 3 Medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) and [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) using prompts from [Gustavosta/Stable-Diffusion-Prompts](https://huggingface.co/datasets/Gustavosta/Stable-Diffusion-Prompts)
203
  - Synthetic Images in the Wild: Dataset containing 210 Authentic and Synthetic Images obtained from Social Media Platforms
204
  - [Flickr 30k Dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset)
205
+ - [Google Landmarks v2](https://github.com/cvdfoundation/google-landmark)
206
+ - [Synthbuster](https://zenodo.org/records/10066460)
207
 
208
  #### Metrics
209
 
210
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
211
 
212
+ - Recall: The proportion of correctly classified positive instances out of all actual positive instances in a dataset.
 
213
 
214
  ### Results
215
 
216
  <!-- This section provides the results of the evaluation. -->
217
 
218
+ #### Authentic Sources
219
+
220
+ | Dataset | Model | Year | Recall |
221
+ |---------------------|-------|------|--------|
222
+ | Flickr30k | - | 2014 | 90.53 |
223
+ | Google Landmarks v2 | - | 2020 | 64.54 |
224
+ | In-the-wild | - | 2024 | 33.06 |
225
+
226
+ #### Synthetic Sources
227
+
228
+ | Dataset | Model | Year | Recall |
229
+ |-------------|---------------------------|------|--------|
230
+ | Synthbuster | Glide | 2021 | 53.50 |
231
+ | Synthbuster | Stable Diffusion 1.3 | 2022 | 87.00 |
232
+ | Synthbuster | Stable Diffusion 1.4 | 2022 | 87.10 |
233
+ | Synthbuster | Stable Diffusion 2 | 2022 | 68.40 |
234
+ | Synthbuster | DALL-E 2 | 2022 | 20.70 |
235
+ | Synthbuster | MidJourney V5 | 2023 | 73.10 |
236
+ | Synthbuster | Stable Diffusion XL | 2023 | 79.50 |
237
+ | Synthbuster | Firefly | 2023 | 40.90 |
238
+ | Synthbuster | DALL-E 3 | 2023 | 88.60 |
239
+ | Authors | Stable Diffusion 3 Medium | 2024 | 93.23 |
240
+ | Authors | Flux.1-dev | 2024 | 96.46 |
241
+ | In-the-wild | Mixed/Unknown | 2024 | 89.90 |
242
 
243
  ### Summary
244
 
245
+ The results for authentic image datasets reveal varying detection performance across different sources. Recall rates range from 33.06% for the In-the-wild dataset to 90.53% for the Flickr30k dataset. The Google Landmarks v2 dataset shows an intermediate recall rate of 64.54%. These results indicate a significant disparity in the detectability of authentic images across different datasets, with the In-the-wild dataset presenting the most challenging case for SuSy.
246
 
247
+ The results for synthetic image datasets show varying detection performance across different image generation models. Recall rates range from 20.70% for DALL-E 2 (2022) to 96.46% for Flux.1-dev (2024). Stable Diffusion models generally exhibited high detectability, with versions 1.3 and 1.4 (2022) showing recall rates above 87%. More recent models tested by the authors, such as Stable Diffusion 3 Medium (2024) and Flux.1-dev (2024), demonstrate even higher detectability with recall rates above 93%. The in-the-wild mixed/unknown synthetic dataset from 2024 showed a high recall of 89.90%, indicating effective detection across various unknown generation methods. These results suggest an overall trend of improving detection capabilities for synthetic images, with newer generation models generally being more easily detectable.
248
 
249
+ It must be noted that these metrics were computed using the center-patch of images, instead of using the patch voting mechanisms described previously. This strategy allows a more fair comparison with other state-of-the-art methods although it hinders the performance of SuSy.
250
 
251
  ## Environmental Impact
252
 
253
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
254
 
255
+ - **Hardware Type:** H100
256
+ - **Hours used:** 16
257
  - **Hardware Provider:** Barcelona Supercomputing Center (BSC)
258
  - **Compute Region:** Spain
259
+ - **Carbon Emitted:** 0.63kg
260
 
261
  ## Citation
262
 
 
265
  **BibTeX:**
266
 
267
  ```bibtex
268
+ @misc{bernabeu2024susy,
269
+ title={Present and Future Generalization of Synthetic Image Detectors},
270
+ author={Bernabeu Perez, Pablo and Lopez Cuena, Enrique and Garcia Gasulla, Dario},
271
+ year={2024},
272
+ month={09}
273
+ }
274
+ ```
275
+
276
+ ```bibtex
277
+ @thesis{bernabeu2024aidetection,
278
  title={Detecting and Attributing AI-Generated Images with Machine Learning},
279
+ author={Bernabeu Perez, Pablo},
280
  school={UPC, Facultat d'Informàtica de Barcelona, Departament de Ciències de la Computació},
281
  year={2024},
282
  month={06}
 
285
 
286
  ## Model Card Authors
287
 
288
+ [Pablo Bernabeu Perez](https://huggingface.co/pabberpe) and [Dario Garcia Gasulla](https://huggingface.co/dariog)
289
 
290
  ## Model Card Contact
291
 
SuSy.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f2916d1fe0967380340860a0d046f625fd4bb34359157bc318707770a002ce9
3
- size 50810328
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa10fae300ee2742c7a373b6c3332c2595b461954b8f5616d2d382ef2751020e
3
+ size 50810392
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "description": "This JSON file does not contain any functional data. Its presence allows Hugging Face to monitor downloads for this repository."
3
+ }
susy_logo.jpeg ADDED