google/pix2struct-infographics-vqa-large Visual Question Answering β’ Updated May 19, 2023 β’ 51 β’ 8
naver-clova-ix/donut-base-finetuned-docvqa Document Question Answering β’ Updated Mar 9 β’ 14.7k β’ 208
Nougat: Neural Optical Understanding for Academic Documents Paper β’ 2308.13418 β’ Published Aug 25, 2023 β’ 35
LayoutLM: Pre-training of Text and Layout for Document Image Understanding Paper β’ 1912.13318 β’ Published Dec 31, 2019 β’ 2
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Paper β’ 2204.08387 β’ Published Apr 18, 2022 β’ 2
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Paper β’ 2012.14740 β’ Published Dec 29, 2020 β’ 1