In this talk, I discuss some benefits of optimizing an entropic optimal transport (OT) loss instead of the log likelihood for model-based clustering. The main drawback to maximizing the log-likelihood (e.g., through the EM algorithm) is the pervasiveness of bad local optima. By comparing the landscape of these two losses and their stationary points, I show that the log-likelihood exhibits ubiquitous ‘many-fit-one’ behaviour on local optima, where many model components are placed on the same true data component. This leads to degeneracy as some remaining model components are placed around averages of true model components. I show that under some structural assumptions those bad optima are avoided by the entropic OT loss.
Besides these theoretical results I present extensive simulations and applications to Neuroscience and Genomics. Altogether, these results suggest that minimizing entropic OT loss is a sensible alternative to maximizing the log-likelihood, at least if the mixture weights are known beforehand.