sourinkarmakar commited on
Commit
9349fd7
·
1 Parent(s): 4affb53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -1
README.md CHANGED
@@ -7,4 +7,96 @@ library_name: transformers
7
  tags:
8
  - donut
9
  - kyc
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  tags:
8
  - donut
9
  - kyc
10
+ ---
11
+
12
+ # Model description
13
+
14
+ Donut is an end-to-end (i.e., self-contained) VDU model for the general understanding of document images. The architecture of Donut is quite simple, which consists of a Transformer based visual encoder and textual decoder modules.
15
+ Donut does not rely on any modules related to OCR functionality but uses a visual encoder for extracting features from a given document image.
16
+ The following textual decoder maps the derived features into a sequence of subword tokens to construct a desired structured format (e.g., JSON). Each model component is Transformer-based, and thus the model is trained easily in an end-to-end manner.
17
+
18
+
19
+ ![image.png](https://cdn-uploads.huggingface.co/production/uploads/637eccd46df7e8f7df76a3ae/OSQp25332524epV2PimZb.png)
20
+
21
+
22
+ # Intended uses and limitations
23
+
24
+ This model is trained to be used for reading the contents of Indian KYC documents. It can classify and read the contents of Aadhar, PAN and Voter. It also can detect the orientation and whether the document is coloured or Black and White. The document for input can be oriented in any direction.
25
+ The model should be provided with a fair-quality image (so that the contents are readable).
26
+ It has been trained on limited data so the performance might not be very good. In future versions, the number of images will be more and more types of KYC documents can be added to this.
27
+
28
+ # Training data
29
+
30
+ For v1, a custom dataset has been used for the training purpose where around 283 images were used, out of which 199 were for training, 42 were for validation and 42 were for testing.
31
+ Out of 199 images, 57 Aadhar samples, 57 PAN samples and 85 Voter samples were used.
32
+
33
+ # Performance
34
+
35
+ The current performance is as follows
36
+ Overall accuracy = 74 %
37
+
38
+ Aadhar = 49 % (need to check out, the reason behind the less accuracy)
39
+ PAN = 94 %
40
+ Voter = 76 %
41
+
42
+ # Inference
43
+
44
+ ``` python
45
+ from transformers import DonutProcessor, VisionEncoderDecoderModel
46
+
47
+ import re
48
+ import cv2
49
+ import json
50
+ import torch
51
+ from tqdm.auto import tqdm
52
+ import numpy as np
53
+
54
+ from donut import JSONParseEvaluator
55
+
56
+ processor = DonutProcessor.from_pretrained("sourinkarmakar/kyc_v1-donut-demo")
57
+ model = VisionEncoderDecoderModel.from_pretrained("sourinkarmakar/kyc_v1-donut-demo")
58
+
59
+ # Need to install python-donut
60
+ # !pip install -q donut-python
61
+
62
+ # Images stored inside a folder 'unseen_samples'
63
+ dataset = glob.glob(os.path.join(basepath, "unseen_samples/*"))
64
+
65
+ output_list = []
66
+
67
+ for idx, sample in tqdm(enumerate(dataset), total=len(dataset)):
68
+ # prepare encoder inputs
69
+ img = cv2.imread(sample)
70
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
71
+ pixel_values = processor(img, return_tensors="pt").pixel_values
72
+ pixel_values = pixel_values.to(device)
73
+
74
+ # prepare decoder inputs
75
+ task_prompt = "<s_cord-v2>"
76
+ decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids
77
+ decoder_input_ids = decoder_input_ids.to(device)
78
+
79
+ # autoregressively generate sequence
80
+ outputs = model.generate(
81
+ pixel_values,
82
+ decoder_input_ids=decoder_input_ids,
83
+ max_length=model.decoder.config.max_position_embeddings,
84
+ early_stopping=True,
85
+ pad_token_id=processor.tokenizer.pad_token_id,
86
+ eos_token_id=processor.tokenizer.eos_token_id,
87
+ use_cache=True,
88
+ num_beams=1,
89
+ bad_words_ids=[[processor.tokenizer.unk_token_id]],
90
+ return_dict_in_generate=True,
91
+ )
92
+
93
+ # turn into JSON
94
+ seq = processor.batch_decode(outputs.sequences)[0]
95
+ seq = seq.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
96
+ seq = re.sub(r"<.*?>", "", seq, count=1).strip() # remove first task start token
97
+ seq = processor.token2json(seq)
98
+
99
+ output_list.append(seq)
100
+
101
+ print(output_list)
102
+ ```