Model Card: tanganke/clip-vit-base-patch32_mnist

Model Details

  • Architecture: ViT-Base with patch size 32
  • Training Data: MNIST dataset

Training Details

Adam Optimizer with a constant learning rate 1e-5 for 4000 steps training (batch_size=32). Only the vision encoder is fine-tuned.

Evaluation Results

  • pre-trained: 0.4759327471256256
  • fine-tuned: 0.9957262277603149

Usage

load vision model

from transformers import CLIPVisionModel

vision_model = CLIPVisionModel.from_pretrained('tanganke/clip-vit-base-patch32_mnist')

substitute the vision encoder of clip

from transformers import CLIPModel

clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_model.vision_model.load_state_dict(vision_model.vision_model.state_dict())
Downloads last month
7,346
Safetensors
Model size
87.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tanganke/clip-vit-base-patch32_mnist

Finetuned
(56)
this model

Dataset used to train tanganke/clip-vit-base-patch32_mnist

Collection including tanganke/clip-vit-base-patch32_mnist