Deep learning models exhibit a rather curious phenomenon. They optimize over hugely complex model classes and are often trained to memorize the training data. This is seemingly contradictory to classical statistical wisdom which suggests avoiding interpolation in favor of reducing the complexity of the prediction rules. A large body of recent work partially resolves this contradiction and suggests that interpolation does not necessarily harm statistical generalization and it may even be necessary for optimal statistical generalization in some settings. In the first part of the talk, I will introduce this phenomenon, referred to as “double descent”.
In the second part, I will talk about a recent work with Leena Vankadara, Luca Rendsburg and Ulrike von Luxburg (arXiv:2202.09054; NeurIPS 2022). In modern ML, we care about more than building good statistical models. We want to learn models which are reliable and have good causal implications. Under a simple linear model in high dimensions, I will discuss the role of interpolation and its counterpart–regularization–in learning better causal models.