Knowledge Distillation Techniques for Machine Learning

by Kenneth Erbs Borup

PhD Dissertations December 2023

Knowledge distillation is a powerful and flexible machine learning technique that can be used to train smaller, more efficient models (called students) by mimicking larger trained models (called teachers). Such students can often achieve better predictive performance than models trained in a classical supervised manner. However, despite its empirical success, a rigorous foundation of knowledge distillation is still largely non-existent.

This dissertation investigates both the theoretical and empirical foundations of knowledge distillation. In the first two papers, we develop theoretical frameworks for understanding knowledge distillation in the simplified settings of self-distillation with kernel ridge regression and Gaussian process models, respectively. In these frame- works, we investigate the properties of iterative self-distillation and determine particular regularizing behaviors imposed by self-distillation.

In a third paper, we perform a rigorous empirical study of knowledge distillation with neural networks to support our theoretical findings. We investigate the efficiency of knowledge distillation under various controlled settings to determine under which conditions we can obtain perfect teacher-student agreement. In a fourth paper, we illustrate the real-world applicability of knowledge distillation by showing how to apply knowledge distillation for personalized automatic sleep scoring based on Ear-EEG measurements.

Finally, in a fifth paper, we address the challenge of exploiting diverse publicly available neural network models to improve the predictive performance on a given task under computational constraints. In particular, we propose a method to construct efficient models by identifying and distilling suitable pre-trained models with minimal access to these models.

Overall, this dissertation contributes to the theoretical and empirical foundations of knowledge distillation and proposes novel methods for adapting publicly available neural network models to specific tasks under constraints.

Format available: PDF (8 MB)

Dissertation supervisors: Lars Nørvang Andersen (Dept. of Mathematics, AU, main-supervisor) and Henrik Karstoft (Dept. of Electrical and Computer Engineering, AU, co-supervisor)

Revised 08.03.2023

Lars Madsen