metadata

library_name: transformers
license: gpl-3.0
language:
  - ar
  - en
pipeline_tag: image-to-text
pretty_name: Arabic Small Nougat
datasets:
  - Fakhraddin/khatt

Arabic Small Nougat

Small, Simple End-to-End Structured OCR for Arabic books.

How to Get Started with the Model

Use the code below to get started with the model.

from PIL import Image
import torch
from transformers import (
    NougatProcessor,
    VisionEncoderDecoderModel,
)


processor = NougatProcessor.from_pretrained("MohamedRashad/arabic-small-nougat")
model = VisionEncoderDecoderModel.from_pretrained("MohamedRashad/arabic-small-nougat")

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)


def predict(image):
    # prepare PDF image for the model
    image = Image.open(image)
    pixel_values = processor(image, return_tensors="pt").pixel_values

    # generate transcription (here we only generate 30 tokens)
    outputs = model.generate(
        pixel_values.to(device),
        min_length=1,
        max_new_tokens=2048,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
    )

    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
    page_sequence = processor.post_process_generation(page_sequence, fix_markdown=False)
    return page_sequence

print(predict("path/to/page_image.jpg"))

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Model Details

Developed by: Mohamed Rashad
Model type: VisionEncoderDecoderModel
Language(s) (NLP): Arabic & English
License: GPL 3.0
Finetuned from model: nougat-small