ViT distilled to MobileNet

This model is a distilled model, where teacher model is merve/beans-vit-224, fine-tuned google/vit-base-patch16-224-in21k on the beans dataset. Student model is randomly initialized MobileNetV2. It achieves the following results on the evaluation set:

Loss: 0.5922
Accuracy: 0.7266

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 25

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.9217	1.0	130	1.0079	0.3835
0.8973	2.0	260	0.8349	0.4286
0.7912	3.0	390	0.8905	0.5414
0.7151	4.0	520	1.1400	0.4887
0.6797	5.0	650	4.5343	0.4135
0.6471	6.0	780	2.1551	0.3985
0.5989	7.0	910	0.8552	0.6090
0.6252	8.0	1040	1.7453	0.5489
0.6025	9.0	1170	0.7852	0.6466
0.5643	10.0	1300	1.4728	0.6090
0.5505	11.0	1430	1.1570	0.6015
0.5207	12.0	1560	3.2526	0.4436
0.4957	13.0	1690	0.6617	0.6541
0.4935	14.0	1820	0.7502	0.6241
0.4836	15.0	1950	1.2039	0.5338
0.4648	16.0	2080	1.0283	0.5338
0.4662	17.0	2210	0.6695	0.7293
0.4351	18.0	2340	0.8694	0.5940
0.4286	19.0	2470	1.2751	0.4737
0.4166	20.0	2600	0.8719	0.6241
0.4263	21.0	2730	0.8767	0.6015
0.4261	22.0	2860	1.2780	0.5564
0.4124	23.0	2990	1.4095	0.5940
0.4082	24.0	3120	0.9104	0.6015
0.3923	25.0	3250	0.6430	0.7068

Framework versions

Transformers 4.34.0
Pytorch 2.0.1+cu118
Datasets 2.14.5
Tokenizers 0.14.1

merve
/

vit-mobilenet-beans-224

ViT distilled to MobileNet

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train merve/vit-mobilenet-beans-224

Evaluation results