Fixed a minor bug with region OCR always returning the first label with an erronious </s>
#25
by
Blackroot
- opened
The raw output of florence here is:
tensor([[ 2, 0, 8108, 500, 50528, 50486, 50736, 50479, 50739, 50592,
50532, 50600, 2]], device='cuda:0')
Florence always seems to begin outputs with a 2, 0 token stream, meaning the first token is always </s>
followed by <s>
. My changes here account for the </s>
, previously only the <s>
token was accounted for in region ocr.
To be clear, the 2 token is </s>
. Without this change, florence does not remove this in the OCR with regions case which results in the first label always having an extra </s>
E.G.:
'labels': ['</s>
SSR']}}