Spaces:
Paused
Paused
iv | |
ABSTRACT | |
knowledge distillation () model compression in visually-rich document layout | |
analysis () and classification. | |
Through empirical studies and methodological contributions, this dissertation | |
has the following contributions and findings: | |
First, in a benchmarking study of established methods on real-world text | |
classification, we find that our novel hybrid method ‘Concrete Dropout | |
Ensemble’ performs best, enhancing in-domain calibration and novel class | |
detection, even at a smaller ensemble size. Detailed ablation experiments | |
reveal the impact of prior, neural architecture, and hyperparameter choices on | |
estimation quality. | |
Second, on a prototypical DU task, we identify challenges in DU progress | |
and propose a formalization of multipage document classification scenarios, | |
constructed novel datasets, and conducted an experimental analysis showing | |
the promise of multipage representation learning and inference. | |
Third, we introduce DUDE, incorporating multifaceted challenges and principles | |
for a comprehensive evaluation of generic DU. Next to our own benchmarking, | |
we organize a competition, revealing that while newer document foundation | |
models show promise, they struggle with questions involving visual evidence or | |
complex reasoning. Moreover, we find severe problems in the ability of Large | |
Language Models (s) to reason about documents in their entirety, highlighting | |
issues with hallucination, long-context reasoning and control. | |
Fourth, we propose the first methodology for enriching documents with semantic | |
layout structure using distilled DLA models. We apply KD to visual document | |
tasks, unraveling the influence of various task and architecture components. | |
Finally, the dissertation concludes with a discussion of the findings and | |
implications for future research, emphasizing the need for advancements in | |
multipage document representation learning and the importance of realistic | |
datasets and experimental methodologies to measurably move forward to reliable | |
and robust IA-DU technology. | |