DHIVEHI NOUGAT BASE (IMAGE-TO-TEXT)

This model is a fine-tuned version of facebook/nougat-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0142

Model description

Finetuned dhivehi on text-image dataset, config all

Usage

from PIL import Image
import torch
from transformers import NougatProcessor, VisionEncoderDecoderModel
from pathlib import Path

# Load the model and processor
processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base")
model = VisionEncoderDecoderModel.from_pretrained(
    "alakxender/dhivehi-nougat-base",  
    torch_dtype=torch.bfloat16,                 # Optional: Load the model with BF16 data type for faster inference and lower memory usage
    attn_implementation={                       # Optional: Specify the attention kernel implementations for different parts of the model
        "decoder": "flash_attention_2",         # Use FlashAttention-2 for the decoder for improved performance
        "encoder": "eager"                      # Use the default ("eager") attention implementation for the encoder
    }
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

context_length = 128

def predict(img_path):
    # Ensure image is in RGB format
    image = Image.open(img_path).convert("RGB")  
    pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)

    # generate prediction
    outputs = model.generate(
        pixel_values.to(device),
        min_length=1,
        max_new_tokens=context_length,
        repetition_penalty=1.5,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        eos_token_id=processor.tokenizer.eos_token_id,
    )

    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
    return page_sequence

print(predict("DV01-04_31.jpg"))

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 18
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
6.4404 0.0057 100 1.0417
5.7761 0.0114 200 0.9055
5.1723 0.0171 300 0.8193
4.8315 0.0228 400 0.7661
4.4217 0.0285 500 0.7232
3.9861 0.0342 600 0.6724
3.7268 0.0400 700 0.5966
3.5393 0.0457 800 0.5337
2.8666 0.0514 900 0.4108
2.0269 0.0571 1000 0.2803
1.4121 0.0628 1100 0.1904
1.0161 0.0685 1200 0.1351
0.867 0.0742 1300 0.1130
0.7506 0.0799 1400 0.0950
0.5764 0.0856 1500 0.0801
0.5123 0.0913 1600 0.0716
0.558 0.0970 1700 0.0650
0.5242 0.1027 1800 0.0616
0.4229 0.1084 1900 0.0556
0.3721 0.1142 2000 0.0545
0.3388 0.1199 2100 0.0519
0.4042 0.1256 2200 0.0499
0.3593 0.1313 2300 0.0449
0.3837 0.1370 2400 0.0421
0.3291 0.1427 2500 0.0407
0.3092 0.1484 2600 0.0388
0.2762 0.1541 2700 0.0380
0.3073 0.1598 2800 0.0422
0.2577 0.1655 2900 0.0340
0.2596 0.1712 3000 0.0331
0.3397 0.1769 3100 0.0328
0.3019 0.1826 3200 0.0307
0.2522 0.1884 3300 0.0314
0.2546 0.1941 3400 0.0289
0.1972 0.1998 3500 0.0282
0.2231 0.2055 3600 0.0300
0.2342 0.2112 3700 0.0278
0.2152 0.2169 3800 0.0276
0.2059 0.2226 3900 0.0260
0.2165 0.2283 4000 0.0257
0.1919 0.2340 4100 0.0253
0.1608 0.2397 4200 0.0244
0.1673 0.2454 4300 0.0242
0.2004 0.2511 4400 0.0248
0.2277 0.2568 4500 0.0230
0.1831 0.2625 4600 0.0228
0.1905 0.2683 4700 0.0221
0.0996 0.2740 4800 0.0215
0.1596 0.2797 4900 0.0213
0.168 0.2854 5000 0.0208
0.2119 0.2911 5100 0.0215
0.1436 0.2968 5200 0.0202
0.1656 0.3025 5300 0.0202
0.1183 0.3082 5400 0.0194
0.1397 0.3139 5500 0.0202
0.1248 0.3196 5600 0.0191
0.1202 0.3253 5700 0.0191
0.1175 0.3310 5800 0.0207
0.1427 0.3367 5900 0.0183
0.1487 0.3425 6000 0.0178
0.1597 0.3482 6100 0.0174
0.1363 0.3539 6200 0.0172
0.1266 0.3596 6300 0.0171
0.1288 0.3653 6400 0.0170
0.1202 0.3710 6500 0.0170
0.1174 0.3767 6600 0.0164
0.1334 0.3824 6700 0.0168
0.1627 0.3881 6800 0.0164
0.0982 0.3938 6900 0.0161
0.1038 0.3995 7000 0.0160
0.1523 0.4052 7100 0.0160
0.1337 0.4109 7200 0.0157
0.2063 0.4167 7300 0.0153
0.1476 0.4224 7400 0.0156
0.0838 0.4281 7500 0.0150
0.082 0.4338 7600 0.0158
0.1269 0.4395 7700 0.0159
0.1168 0.4452 7800 0.0147
0.1024 0.4509 7900 0.0147
0.1138 0.4566 8000 0.0145
0.1188 0.4623 8100 0.0146
0.0881 0.4680 8200 0.0142
0.0752 0.4737 8300 0.0138
0.1165 0.4794 8400 0.0141
0.1017 0.4851 8500 0.0137
0.0971 0.4909 8600 0.0135
0.135 0.4966 8700 0.0136
0.0732 0.5023 8800 0.0137
0.1217 0.5080 8900 0.0142

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
18
Safetensors
Model size
349M params
Tensor type
I64
·
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for alakxender/dhivehi-nougat-base

Finetuned
(9)
this model

Dataset used to train alakxender/dhivehi-nougat-base