iv ABSTRACT knowledge distillation () model compression in visually-rich document layout analysis () and classification. Through empirical studies and methodological contributions, this dissertation has the following contributions and findings: First, in a benchmarking study of established methods on real-world text classification, we find that our novel hybrid method ‘Concrete Dropout Ensemble’ performs best, enhancing in-domain calibration and novel class detection, even at a smaller ensemble size. Detailed ablation experiments reveal the impact of prior, neural architecture, and hyperparameter choices on estimation quality. Second, on a prototypical DU task, we identify challenges in DU progress and propose a formalization of multipage document classification scenarios, constructed novel datasets, and conducted an experimental analysis showing the promise of multipage representation learning and inference. Third, we introduce DUDE, incorporating multifaceted challenges and principles for a comprehensive evaluation of generic DU. Next to our own benchmarking, we organize a competition, revealing that while newer document foundation models show promise, they struggle with questions involving visual evidence or complex reasoning. Moreover, we find severe problems in the ability of Large Language Models (s) to reason about documents in their entirety, highlighting issues with hallucination, long-context reasoning and control. Fourth, we propose the first methodology for enriching documents with semantic layout structure using distilled DLA models. We apply KD to visual document tasks, unraveling the influence of various task and architecture components. Finally, the dissertation concludes with a discussion of the findings and implications for future research, emphasizing the need for advancements in multipage document representation learning and the importance of realistic datasets and experimental methodologies to measurably move forward to reliable and robust IA-DU technology.