Spaces:

MohamedRashad
/

Arabic-Nougat

Running on Zero

MohamedRashad commited on Feb 17, 2024

Commit

71311e8

1 Parent(s): e076d40

Add model description to app.py

Files changed (2) hide show

app.py CHANGED Viewed

@@ -55,8 +55,18 @@ def extract_text_from_pdf(pdf_path, progress=gr.Progress()):
     return "\n".join(texts)
 with gr.Blocks(title="Arabic Small Nougat") as demo:
     gr.HTML("<h1 style='text-align: center'>Arabic End-to-End Structured OCR for textbooks</h1>")
     with gr.Tab("Extract Text from Image"):
         with gr.Row():

     return "\n".join(texts)
+model_description = """
+This is a demo for the Arabic Small Nougat model. It is an end-to-end OCR model that can extract text from images and PDFs.
+- The model is trained on the [Khatt dataset](https://huggingface.co/datasets/Fakhraddin/khatt) and custom made dataset.
+- The model is a finetune of [facebook/nougat-small](https://huggingface.co/facebook/nougat-small) model.
+**Note**: The model is a prototype in my book and may not work well on all types of images and PDFs. **Check the output carefully before using it for any serious work.**
+"""
 with gr.Blocks(title="Arabic Small Nougat") as demo:
     gr.HTML("<h1 style='text-align: center'>Arabic End-to-End Structured OCR for textbooks</h1>")
+    gr.Markdown(model_description)
     with gr.Tab("Extract Text from Image"):
         with gr.Row():

requirements.txt CHANGED Viewed

@@ -1,3 +1,4 @@
 pdf2image
 transformers
 gradio

 pdf2image
+torch
 transformers
 gradio