arabic-small-nougat / README.md
MohamedRashad's picture
Update README.md
6d8361a verified
|
raw
history blame
2.09 kB
---
library_name: transformers
license: gpl-3.0
language:
- ar
- en
pipeline_tag: image-to-text
pretty_name: Arabic Small Nougat
datasets:
- Fakhraddin/khatt
---
# Arabic Small Nougat
**Sma**ll, **Simp**le **En**d-**t**o-**En**d **Structur**ed **OC**R **fo**r **Arab**ic **boo**ks.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from PIL import Image
import torch
from transformers import (
NougatProcessor,
VisionEncoderDecoderModel,
)
processor = NougatProcessor.from_pretrained("MohamedRashad/arabic-small-nougat")
model = VisionEncoderDecoderModel.from_pretrained("MohamedRashad/arabic-small-nougat")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
def predict(image):
# prepare PDF image for the model
image = Image.open(image)
pixel_values = processor(image, return_tensors="pt").pixel_values
# generate transcription (here we only generate 30 tokens)
outputs = model.generate(
pixel_values.to(device),
min_length=1,
max_new_tokens=2048,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
)
page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
page_sequence = processor.post_process_generation(page_sequence, fix_markdown=False)
return page_sequence
print(predict("path/to/page_image.jpg"))
```
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## Model Details
- **Developed by:** Mohamed Rashad
- **Model type:** VisionEncoderDecoderModel
- **Language(s) (NLP):** Arabic & English
- **License:** GPL 3.0
- **Finetuned from model:** [nougat-small](https://huggingface.co/facebook/nougat-small)