ask_my_thesis / assets /txts /pg_0048.txt
jordyvl's picture
First commit
e0a78f5
raw
history blame
1.76 kB
16
FUNDAMENTALS
Definition 1. Probabilistic predictor f : X β†’ βˆ†Y that outputs a conditional
probability distribution P (y 0 |x) over outputs y 0 ∈ Y for an i.i.d. drawn sample
(x,y).
|Y|
Definition 2 (Probability Simplex). Let βˆ†Y := {v ∈ Rβ‰₯0 : kvk1 = 1} be a
probability simplex of size |Y| βˆ’ 1 as a geometric representation of a probability
space, where each vertex represents a mutually exclusive label and each point
has an associated probability vector v [368].
Figure 2.1 illustrates a multi-class classifier, where Y = [K] for K=3 classes.
photos.google.com
Google Photos
Home for all your photos and videos,
automatically organized and easy to
share.
https://photos.google.com/search/fox
Figure 2.1. Scatter plot of a ternary problem (K = 3, N = 100) in the probability
simplex space. Example of overconfident misprediction (above is a Shiba Inu dog) and
correct sharp prediction (clear image of Beagle).
In practice, loss functions are proper scoring rules [330], S : βˆ†Y Γ— Y β†’ R, that
measure the quality of a probabilistic prediction P (yΜ‚|x) given the true label y.
The cross-entropy (CE) loss is a popular loss function for classification, while
the mean-squared error (MSE) loss is used for regression. In Section 2.2, we
will discuss the evaluation of probabilistic predictors in more detail, including
the calibration of confidence estimates and the detection of out-of-distribution
samples.
2.1.3
Architectures
Throughout the chapters of the thesis, we have primarily used the following
NN architectures: Convolutional Neural Networks (CNNs), Transformer
Networks . We will briefly introduce the building blocks of these architectures,
with a focus on how they are used in the context of document understanding.