---
license: apache-2.0
datasets:
- bharat-raghunathan/indian-foods-dataset
metrics:
- accuracy
- precision
- recall
---

# Indian Food Classification with Vision Transformer (ViT)

## Overview
This model is a fine-tuned Vision Transformer (ViT) for the task of classifying images of Indian foods. The model was trained on the [Indian Foods Dataset](https://huggingface.co/datasets/bharat-raghunathan/indian-foods-dataset) from Hugging Face Datasets.

## Dataset
The Indian Foods Dataset contains 4,770 images across 15 different classes of popular Indian dishes. The dataset is split into:

- Training: 3,047 images
- Validation: 762 images 
- Testing: 961 images

## Model
The base model used is the vision transformer (google/vit-base-patch16-224-in21k). The model was fine-tuned on the Indian Foods Dataset for 10 epochs using the AdamW optimizer with a learning rate of 2e-4.

## Evaluation
The model was evaluated on the test set and achieved the following metrics:

- Accuracy: 0.9667
- Precision: 0.9670
- Recall: 0.9667

## Usage
You can use this pre-trained model directly from Hugging Face