You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

TrOCR Fine-Tuned for Nepali Language

Model Description

This model is a fine-tuned version of Microsoft's TrOCR model for optical character recognition (OCR) tasks, specifically trained to recognize and generate Nepali text from handwritten or printed image inputs. It leverages a VisionEncoderDecoder architecture with a DeiT-based encoder and a BERT-based decoder.

Model Architecture

Training Details

  • Dataset: Fine-tuned using a Nepali dataset consisting of handwritten and printed text.
  • Objective: Generate accurate Nepali text outputs from images containing textual content.
  • Optimization: Trained with a combination of beam search and length penalty to enhance the quality of text generation.
  • Beam Search Parameters:
    • num_beams = 8
    • length_penalty = 2.0
    • max_length = 47
    • no_repeat_ngram_size = 3

Usage

Inference Example

To use this model for OCR tasks, you can follow the steps below:

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

# Load the fine-tuned model and processor
model = VisionEncoderDecoderModel.from_pretrained("rockerritesh/trOCR_ne")
processor = TrOCRProcessor.from_pretrained("rockerritesh/trOCR_ne")

# Load an image
image = Image.open("path_to_image.jpg").convert("RGB")

# Preprocess image and generate predictions
pixel_values = processor(images=image, return_tensors="pt").pixel_values
output_ids = model.generate(pixel_values, num_beams=8, max_length=47, early_stopping=True)
decoded_text = processor.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]

print("Recognized Text:", decoded_text)

Hugging Face Hub

You can access the model and its processor on the Hugging Face Hub:

Features

  • OCR for Nepali: Trained to accurately recognize Nepali text in handwritten and printed formats.
  • Robust Tokenizer: Utilizes the Nepali BERT tokenizer for efficient tokenization.
  • Efficient Inference: Supports beam search and length penalties to optimize generation quality.

Fine-Tuning Details

Hyperparameters

Hyperparameter Value
Batch Size 16
Learning Rate 5e-5
Epochs 5
Optimizer AdamW
Beam Search Beams 8
Max Length 47
Length Penalty 2.0
No Repeat N-Gram Size 3

Model Configuration

The model was configured as follows:

Decoder

  • Activation Function: ReLU
  • Attention Heads: 8
  • Layers: 6
  • Hidden Size: 256
  • FFN Size: 1024

Encoder

  • Hidden Size: 384
  • Layers: 12
  • Attention Heads: 6
  • Image Size: 384

Dataset Details

The dataset used for fine-tuning consists of diverse handwritten and printed Nepali text from publicly available and custom datasets.

Limitations and Bias

  • The model's performance depends on the quality and diversity of the fine-tuning dataset.
  • It may not generalize well to unseen handwriting styles or printed text with unconventional fonts.

Citation

If you use this model in your research or applications, please cite:

@article{rockerritesh-trocr-nepali,
  title={Fine-Tuned TrOCR Model for Nepali Language},
  author={Sumit Yadav},
  year={2024},
  url={https://huggingface.co/rockerritesh/trOCR_ne}
}

License

license: apache-2.0

Downloads last month
27
Safetensors
Model size
44.4M params
Tensor type
F32
·
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for rockerritesh/trOCR_ne

Finetuned
(8)
this model