Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published May 2020 | public
Book Section - Chapter

A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models

Abstract

We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overpa-rameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using sufficiently small step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with ℓ₁₀ norm potential (as a surrogate for ℓ∞ ) consistently general-izes better than SGD (corresponding to an ℓ₂ norm potential), which in turn consistently outperforms SMD with ℓ₁ norm potential.

Additional Information

© 2020 IEEE.

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023