You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Google/paligemma2-3b-pt-896 model fine-tuned for US IRS Form 1040 (2023) data parsing and extraction

The repository only provides Peft LORA weights. The lora layers have been fine-tuned to parse and extract data from IRS (US) tax form 1040 (year 2023) first page only. It performs OCR and returns extracted data in JSON format using zero shot prompt.



from PIL import Image
import torch
import json

from transformers import PaliGemmaForConditionalGeneration, AutoProcessor
from peft import PeftModel


model_id = 'google/paligemma-3b-pt-896'
peft_model_id = 'hsarfraz/google-paligemma-irs-form-1040-2023-parser-pg1'

device = "cuda:0" if torch.cuda.is_available() else "cpu"

# load base model 
processor = AutoProcessor.from_pretrained(model_id,padding_side = "right",add_eos_token = True)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, device_map={"":0}, torch_dtype=torch.bfloat16)

# load fine-tuned peft weights
fine_tuned_model = PeftModel.from_pretrained(model, peft_model_id)
fine_tuned_model.to(device)

# prompt for OCR
prompt = "<image>extract data in JSON format"

# path to local image file
image_file = '<replace with path to input image>'
image = Image.open(image_file)

# get tokens
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device)
prefix_length = inputs["input_ids"].shape[-1] 

#switch to inference mode
with torch.inference_mode():       
    generation = fine_tuned_model.generate(**inputs, max_new_tokens=1152)
    generation = generation[0][prefix_length:]
    decoded = processor.decode(generation, skip_special_tokens=True)
    
    # parse output as json 
    try:
        output_json =json.dumps(json.loads(decoded), indent=4) 
    except (Exception) as error:
        print('Error: %s' % error)
        output_json = decoded 
    
    # display parsed json
    print(output_json)  


Fake Synthetic Data for IRS 1040 2023 Form Page 1

fake form

Parsed output in json

{
    "lbl_0_03": "Andrew Huffman",
    "lbl_0_04": "Phillips",
    "lbl_0_05": "247-27-3525",
    "lbl_0_06": "Martin",
    "lbl_0_08": "797-83-3491",
    "lbl_0_09": "PSC 8861, Box 7908 APO AE 15945",
    "lbl_0_11": "Andrewhaven",
    "lbl_0_12": "IA",
    "lbl_0_13": "16560",
    "lbl_0_55": "504583.65",
    "lbl_0_66": "473782.31",
    "lbl_0_67": "626674.66",
    "lbl_0_79": "559436.54"
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for hsarfraz/google-paligemma-irs-form-1040-2023-parser-pg1

Finetuned
(1)
this model