Mohd Zeeshan commited on
Commit
ccf05b6
1 Parent(s): dbb9bb2

Uploaded the PyTorch model weights

Browse files

# <ins>FaKe-ViT-B/16: Robust and Fast AI-Generated Image Detection using Vision Transformer(ViT-B/16):

FaKe-ViT-B/16 is Finetuned ViT-Base model for the task of **classifying AI-generated images/Fake images and Real images.**

This is a **90M param transformer model** that is *Extremely Robust **(88% Accuracy on test set)**.It also **Generalizes well** on images from **newer diffusion models** and has **Fast (~5.4 sec/img) inference.**

The intuition behind using ViT for this task is due to the Transformer architecture's ability to adapt to and capture **global features and global contexts**, just like Transformer language models like BERT. This is because we are not detecting a specific image or such but looking for small nuances/difference within real and fake images that are produced by these diffusion models.

**Here's the demo:** https://huggingface.co/spaces/Zappy586/Fake-ViT

And here's the **Colab notebook** where I tried to train the model from scratch by replicating the paper: https://github.com/zappy586/FAKE-ViT/blob/main/ViT_Paper_replication.ipynb

The **original ViT paper**: https://arxiv.org/abs/2010.11929

Files changed (1) hide show
  1. FaKe-ViT-B16.pth +3 -0
FaKe-ViT-B16.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a2d9f5edce776c627c3797b1f1a6be5d243a188ce39b9546da2ee031b363c30
3
+ size 343286022