MohamedRashad
/

arabic-small-nougat

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

arabic-small-nougat / README.md

MohamedRashad's picture

Update README.md

6d8361a verified about 1 year ago

|

2.09 kB

	---
	library_name: transformers
	license: gpl-3.0
	language:
	- ar
	- en
	pipeline_tag: image-to-text
	pretty_name: Arabic Small Nougat
	datasets:
	- Fakhraddin/khatt
	---

	# Arabic Small Nougat

	Small, Simple End-to-End Structured OCR for Arabic books.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from PIL import Image
	import torch
	from transformers import (
	NougatProcessor,
	VisionEncoderDecoderModel,
	)


	processor = NougatProcessor.from_pretrained("MohamedRashad/arabic-small-nougat")
	model = VisionEncoderDecoderModel.from_pretrained("MohamedRashad/arabic-small-nougat")

	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)


	def predict(image):
	# prepare PDF image for the model
	image = Image.open(image)
	pixel_values = processor(image, return_tensors="pt").pixel_values

	# generate transcription (here we only generate 30 tokens)
	outputs = model.generate(
	pixel_values.to(device),
	min_length=1,
	max_new_tokens=2048,
	bad_words_ids=[[processor.tokenizer.unk_token_id]],
	)

	page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
	page_sequence = processor.post_process_generation(page_sequence, fix_markdown=False)
	return page_sequence

	print(predict("path/to/page_image.jpg"))
	```


	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	[More Information Needed]

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


	## Model Details

	- Developed by: Mohamed Rashad
	- Model type: VisionEncoderDecoderModel
	- Language(s) (NLP): Arabic & English
	- License: GPL 3.0
	- Finetuned from model: [nougat-small](https://huggingface.co/facebook/nougat-small)