trOCR_ne / README.md

Update README.md

480581b verified 3 months ago

4.29 kB

	---
	Here is a detailed model card for your fine-tuned TrOCR model for the Nepali language: null
	license: apache-2.0
	language:
	- ne
	metrics:
	- wer
	- cer
	base_model:
	- microsoft/trocr-base-handwritten
	pipeline_tag: image-text-to-text
	library_name: transformers
	tags:
	- trocr
	- nepali
	- ocr
	- handwritten-text
	- vision
	- text-recognition
	---
	# TrOCR Fine-Tuned for Nepali Language

	## Model Description

	This model is a fine-tuned version of [Microsoft's TrOCR model](https://huggingface.co/microsoft/trocr-base-handwritten) for optical character recognition (OCR) tasks, specifically trained to recognize and generate Nepali text from handwritten or printed image inputs. It leverages a VisionEncoderDecoder architecture with a DeiT-based encoder and a BERT-based decoder.

	## Model Architecture

	- Encoder: Vision Transformer (DeiT)
	- Decoder: BERT-like architecture adapted for OCR tasks
	- Pretrained Base: [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten)
	- Tokenizer: Nepali BERT tokenizer from [Shushant/nepaliBERT](https://huggingface.co/Shushant/nepaliBERT)

	## Training Details

	- Dataset: Fine-tuned using a Nepali dataset consisting of handwritten and printed text.
	- Objective: Generate accurate Nepali text outputs from images containing textual content.
	- Optimization: Trained with a combination of beam search and length penalty to enhance the quality of text generation.
	- Beam Search Parameters:
	- `num_beams = 8`
	- `length_penalty = 2.0`
	- `max_length = 47`
	- `no_repeat_ngram_size = 3`

	## Usage

	### Inference Example

	To use this model for OCR tasks, you can follow the steps below:

	```python
	from transformers import TrOCRProcessor, VisionEncoderDecoderModel
	from PIL import Image

	# Load the fine-tuned model and processor
	model = VisionEncoderDecoderModel.from_pretrained("rockerritesh/trOCR_ne")
	processor = TrOCRProcessor.from_pretrained("rockerritesh/trOCR_ne")

	# Load an image
	image = Image.open("path_to_image.jpg").convert("RGB")

	# Preprocess image and generate predictions
	pixel_values = processor(images=image, return_tensors="pt").pixel_values
	output_ids = model.generate(pixel_values, num_beams=8, max_length=47, early_stopping=True)
	decoded_text = processor.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]

	print("Recognized Text:", decoded_text)
	```

	### Hugging Face Hub

	You can access the model and its processor on the Hugging Face Hub:

	- Model: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne)
	- Processor: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne)

	### Features

	- OCR for Nepali: Trained to accurately recognize Nepali text in handwritten and printed formats.
	- Robust Tokenizer: Utilizes the Nepali BERT tokenizer for efficient tokenization.
	- Efficient Inference: Supports beam search and length penalties to optimize generation quality.

	## Fine-Tuning Details

	### Hyperparameters

	\| Hyperparameter \| Value \|
	\|----------------------\|--------\|
	\| Batch Size \| 16 \|
	\| Learning Rate \| 5e-5 \|
	\| Epochs \| 5 \|
	\| Optimizer \| AdamW \|
	\| Beam Search Beams \| 8 \|
	\| Max Length \| 47 \|
	\| Length Penalty \| 2.0 \|
	\| No Repeat N-Gram Size\| 3 \|

	### Model Configuration

	The model was configured as follows:

	#### Decoder
	- Activation Function: ReLU
	- Attention Heads: 8
	- Layers: 6
	- Hidden Size: 256
	- FFN Size: 1024

	#### Encoder
	- Hidden Size: 384
	- Layers: 12
	- Attention Heads: 6
	- Image Size: 384

	### Dataset Details

	The dataset used for fine-tuning consists of diverse handwritten and printed Nepali text from publicly available and custom datasets.

	## Limitations and Bias

	- The model's performance depends on the quality and diversity of the fine-tuning dataset.
	- It may not generalize well to unseen handwriting styles or printed text with unconventional fonts.

	## Citation

	If you use this model in your research or applications, please cite:

	```plaintext
	@article{rockerritesh-trocr-nepali,
	title={Fine-Tuned TrOCR Model for Nepali Language},
	author={Sumit Yadav},
	year={2024},
	url={https://huggingface.co/rockerritesh/trOCR_ne}
	}
	```

	## License

	license: apache-2.0