Spaces:
Paused
Paused
16 | |
FUNDAMENTALS | |
Definition 1. Probabilistic predictor f : X β βY that outputs a conditional | |
probability distribution P (y 0 |x) over outputs y 0 β Y for an i.i.d. drawn sample | |
(x,y). | |
|Y| | |
Definition 2 (Probability Simplex). Let βY := {v β Rβ₯0 : kvk1 = 1} be a | |
probability simplex of size |Y| β 1 as a geometric representation of a probability | |
space, where each vertex represents a mutually exclusive label and each point | |
has an associated probability vector v [368]. | |
Figure 2.1 illustrates a multi-class classifier, where Y = [K] for K=3 classes. | |
photos.google.com | |
Google Photos | |
Home for all your photos and videos, | |
automatically organized and easy to | |
share. | |
https://photos.google.com/search/fox | |
Figure 2.1. Scatter plot of a ternary problem (K = 3, N = 100) in the probability | |
simplex space. Example of overconfident misprediction (above is a Shiba Inu dog) and | |
correct sharp prediction (clear image of Beagle). | |
In practice, loss functions are proper scoring rules [330], S : βY Γ Y β R, that | |
measure the quality of a probabilistic prediction P (yΜ|x) given the true label y. | |
The cross-entropy (CE) loss is a popular loss function for classification, while | |
the mean-squared error (MSE) loss is used for regression. In Section 2.2, we | |
will discuss the evaluation of probabilistic predictors in more detail, including | |
the calibration of confidence estimates and the detection of out-of-distribution | |
samples. | |
2.1.3 | |
Architectures | |
Throughout the chapters of the thesis, we have primarily used the following | |
NN architectures: Convolutional Neural Networks (CNNs), Transformer | |
Networks . We will briefly introduce the building blocks of these architectures, | |
with a focus on how they are used in the context of document understanding. | |