arxiv:2301.13304

Understanding Self-Distillation in the Presence of Label Noise

Published on Jan 30, 2023

Authors:

Abstract

Self-distillation (SD) is the process of first training a teacher model and then using its predictions to train a student model with the same architecture. Specifically, the student's objective function is big(xi*ell(teacher's predictions, student's predictions) + (1-xi)*ell(given labels, student's predictions)big), where ell is some loss function and xi is some parameter in [0,1]. Empirically, SD has been observed to provide performance gains in several settings. In this paper, we theoretically characterize the effect of SD in two supervised learning problems with noisy labels. We first analyze SD for regularized linear regression and show that in the high label noise regime, the optimal value of xi that minimizes the expected error in estimating the ground truth parameter is surprisingly greater than 1. Empirically, we show that xi > 1 works better than xi leq 1 even with the cross-entropy loss for several classification datasets when 50\% or 30\% of the labels are corrupted. Further, we quantify when optimal SD is better than optimal regularization. Next, we analyze SD in the case of logistic regression for binary classification with random label corruption and quantify the range of label corruption in which the student outperforms the teacher in terms of accuracy. To our knowledge, this is the first result of its kind for the cross-entropy loss.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2301.13304 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2301.13304 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2301.13304 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.