Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0048.txt

jordyvl

First commit

e0a78f5 10 months ago

raw

history blame

1.76 kB

	16

	FUNDAMENTALS

	Definition 1. Probabilistic predictor f : X → ∆Y that outputs a conditional
	probability distribution P (y 0 \|x) over outputs y 0 ∈ Y for an i.i.d. drawn sample
	(x,y).
	\|Y\|

	Definition 2 (Probability Simplex). Let ∆Y := {v ∈ R≥0 : kvk1 = 1} be a
	probability simplex of size \|Y\| − 1 as a geometric representation of a probability
	space, where each vertex represents a mutually exclusive label and each point
	has an associated probability vector v [368].
	Figure 2.1 illustrates a multi-class classifier, where Y = [K] for K=3 classes.
	photos.google.com

	Google Photos
	Home for all your photos and videos,
	automatically organized and easy to
	share.

	https://photos.google.com/search/fox

	Figure 2.1. Scatter plot of a ternary problem (K = 3, N = 100) in the probability
	simplex space. Example of overconfident misprediction (above is a Shiba Inu dog) and
	correct sharp prediction (clear image of Beagle).

	In practice, loss functions are proper scoring rules [330], S : ∆Y × Y → R, that
	measure the quality of a probabilistic prediction P (ŷ\|x) given the true label y.
	The cross-entropy (CE) loss is a popular loss function for classification, while
	the mean-squared error (MSE) loss is used for regression. In Section 2.2, we
	will discuss the evaluation of probabilistic predictors in more detail, including
	the calibration of confidence estimates and the detection of out-of-distribution
	samples.

	2.1.3

	Architectures

	Throughout the chapters of the thesis, we have primarily used the following
	NN architectures: Convolutional Neural Networks (CNNs), Transformer
	Networks . We will briefly introduce the building blocks of these architectures,
	with a focus on how they are used in the context of document understanding.