trOCR_ne / README.md
rockerritesh's picture
Update README.md
480581b verified
metadata
Here is a detailed model card for your fine-tuned TrOCR model for the Nepali language: null
license: apache-2.0
language:
  - ne
metrics:
  - wer
  - cer
base_model:
  - microsoft/trocr-base-handwritten
pipeline_tag: image-text-to-text
library_name: transformers
tags:
  - trocr
  - nepali
  - ocr
  - handwritten-text
  - vision
  - text-recognition

TrOCR Fine-Tuned for Nepali Language

Model Description

This model is a fine-tuned version of Microsoft's TrOCR model for optical character recognition (OCR) tasks, specifically trained to recognize and generate Nepali text from handwritten or printed image inputs. It leverages a VisionEncoderDecoder architecture with a DeiT-based encoder and a BERT-based decoder.

Model Architecture

Training Details

  • Dataset: Fine-tuned using a Nepali dataset consisting of handwritten and printed text.
  • Objective: Generate accurate Nepali text outputs from images containing textual content.
  • Optimization: Trained with a combination of beam search and length penalty to enhance the quality of text generation.
  • Beam Search Parameters:
    • num_beams = 8
    • length_penalty = 2.0
    • max_length = 47
    • no_repeat_ngram_size = 3

Usage

Inference Example

To use this model for OCR tasks, you can follow the steps below:

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

# Load the fine-tuned model and processor
model = VisionEncoderDecoderModel.from_pretrained("rockerritesh/trOCR_ne")
processor = TrOCRProcessor.from_pretrained("rockerritesh/trOCR_ne")

# Load an image
image = Image.open("path_to_image.jpg").convert("RGB")

# Preprocess image and generate predictions
pixel_values = processor(images=image, return_tensors="pt").pixel_values
output_ids = model.generate(pixel_values, num_beams=8, max_length=47, early_stopping=True)
decoded_text = processor.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]

print("Recognized Text:", decoded_text)

Hugging Face Hub

You can access the model and its processor on the Hugging Face Hub:

Features

  • OCR for Nepali: Trained to accurately recognize Nepali text in handwritten and printed formats.
  • Robust Tokenizer: Utilizes the Nepali BERT tokenizer for efficient tokenization.
  • Efficient Inference: Supports beam search and length penalties to optimize generation quality.

Fine-Tuning Details

Hyperparameters

Hyperparameter Value
Batch Size 16
Learning Rate 5e-5
Epochs 5
Optimizer AdamW
Beam Search Beams 8
Max Length 47
Length Penalty 2.0
No Repeat N-Gram Size 3

Model Configuration

The model was configured as follows:

Decoder

  • Activation Function: ReLU
  • Attention Heads: 8
  • Layers: 6
  • Hidden Size: 256
  • FFN Size: 1024

Encoder

  • Hidden Size: 384
  • Layers: 12
  • Attention Heads: 6
  • Image Size: 384

Dataset Details

The dataset used for fine-tuning consists of diverse handwritten and printed Nepali text from publicly available and custom datasets.

Limitations and Bias

  • The model's performance depends on the quality and diversity of the fine-tuning dataset.
  • It may not generalize well to unseen handwriting styles or printed text with unconventional fonts.

Citation

If you use this model in your research or applications, please cite:

@article{rockerritesh-trocr-nepali,
  title={Fine-Tuned TrOCR Model for Nepali Language},
  author={Sumit Yadav},
  year={2024},
  url={https://huggingface.co/rockerritesh/trOCR_ne}
}

License

license: apache-2.0