|
--- |
|
Here is a detailed model card for your fine-tuned TrOCR model for the Nepali language: null |
|
license: apache-2.0 |
|
language: |
|
- ne |
|
metrics: |
|
- wer |
|
- cer |
|
base_model: |
|
- microsoft/trocr-base-handwritten |
|
pipeline_tag: image-text-to-text |
|
library_name: transformers |
|
tags: |
|
- trocr |
|
- nepali |
|
- ocr |
|
- handwritten-text |
|
- vision |
|
- text-recognition |
|
--- |
|
# **TrOCR Fine-Tuned for Nepali Language** |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of [Microsoft's TrOCR model](https://huggingface.co/microsoft/trocr-base-handwritten) for optical character recognition (OCR) tasks, specifically trained to recognize and generate Nepali text from handwritten or printed image inputs. It leverages a VisionEncoderDecoder architecture with a DeiT-based encoder and a BERT-based decoder. |
|
|
|
## Model Architecture |
|
|
|
- **Encoder**: Vision Transformer (DeiT) |
|
- **Decoder**: BERT-like architecture adapted for OCR tasks |
|
- **Pretrained Base**: [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) |
|
- **Tokenizer**: Nepali BERT tokenizer from [Shushant/nepaliBERT](https://huggingface.co/Shushant/nepaliBERT) |
|
|
|
## Training Details |
|
|
|
- **Dataset**: Fine-tuned using a Nepali dataset consisting of handwritten and printed text. |
|
- **Objective**: Generate accurate Nepali text outputs from images containing textual content. |
|
- **Optimization**: Trained with a combination of beam search and length penalty to enhance the quality of text generation. |
|
- **Beam Search Parameters**: |
|
- `num_beams = 8` |
|
- `length_penalty = 2.0` |
|
- `max_length = 47` |
|
- `no_repeat_ngram_size = 3` |
|
|
|
## Usage |
|
|
|
### Inference Example |
|
|
|
To use this model for OCR tasks, you can follow the steps below: |
|
|
|
```python |
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
from PIL import Image |
|
|
|
# Load the fine-tuned model and processor |
|
model = VisionEncoderDecoderModel.from_pretrained("rockerritesh/trOCR_ne") |
|
processor = TrOCRProcessor.from_pretrained("rockerritesh/trOCR_ne") |
|
|
|
# Load an image |
|
image = Image.open("path_to_image.jpg").convert("RGB") |
|
|
|
# Preprocess image and generate predictions |
|
pixel_values = processor(images=image, return_tensors="pt").pixel_values |
|
output_ids = model.generate(pixel_values, num_beams=8, max_length=47, early_stopping=True) |
|
decoded_text = processor.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0] |
|
|
|
print("Recognized Text:", decoded_text) |
|
``` |
|
|
|
### Hugging Face Hub |
|
|
|
You can access the model and its processor on the Hugging Face Hub: |
|
|
|
- **Model**: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne) |
|
- **Processor**: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne) |
|
|
|
### Features |
|
|
|
- **OCR for Nepali**: Trained to accurately recognize Nepali text in handwritten and printed formats. |
|
- **Robust Tokenizer**: Utilizes the Nepali BERT tokenizer for efficient tokenization. |
|
- **Efficient Inference**: Supports beam search and length penalties to optimize generation quality. |
|
|
|
## Fine-Tuning Details |
|
|
|
### Hyperparameters |
|
|
|
| Hyperparameter | Value | |
|
|----------------------|--------| |
|
| Batch Size | 16 | |
|
| Learning Rate | 5e-5 | |
|
| Epochs | 5 | |
|
| Optimizer | AdamW | |
|
| Beam Search Beams | 8 | |
|
| Max Length | 47 | |
|
| Length Penalty | 2.0 | |
|
| No Repeat N-Gram Size| 3 | |
|
|
|
### Model Configuration |
|
|
|
The model was configured as follows: |
|
|
|
#### Decoder |
|
- Activation Function: ReLU |
|
- Attention Heads: 8 |
|
- Layers: 6 |
|
- Hidden Size: 256 |
|
- FFN Size: 1024 |
|
|
|
#### Encoder |
|
- Hidden Size: 384 |
|
- Layers: 12 |
|
- Attention Heads: 6 |
|
- Image Size: 384 |
|
|
|
### Dataset Details |
|
|
|
The dataset used for fine-tuning consists of diverse handwritten and printed Nepali text from publicly available and custom datasets. |
|
|
|
## Limitations and Bias |
|
|
|
- The model's performance depends on the quality and diversity of the fine-tuning dataset. |
|
- It may not generalize well to unseen handwriting styles or printed text with unconventional fonts. |
|
|
|
## Citation |
|
|
|
If you use this model in your research or applications, please cite: |
|
|
|
```plaintext |
|
@article{rockerritesh-trocr-nepali, |
|
title={Fine-Tuned TrOCR Model for Nepali Language}, |
|
author={Sumit Yadav}, |
|
year={2024}, |
|
url={https://huggingface.co/rockerritesh/trOCR_ne} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
license: apache-2.0 |