This model is a fine-tuned version of facebook/deit-tiny-distilled-patch16-224 on the docornot dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0000
- Accuracy: 1.0
CO2 emissions
This model was trained on an M1 and took 0.322 g of CO2 (measured with CodeCarbon)
Model description
This model is distilled Vision Transformer (ViT) model. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded.
Intended uses & limitations
You can use this model to detect if an image is a picture or a document.
Training procedure
Source code used to generate this model : https://github.com/mozilla/docornot
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
0.0 | 1.0 | 1600 | 0.0000 | 1.0 |
Framework versions
- Transformers 4.39.2
- Pytorch 2.2.2
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 39
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Mozilla/docornot
Base model
facebook/deit-tiny-distilled-patch16-224