Published January 14, 2023 | Version v1
Conference paper Open

Large deviations rates for stochastic gradient descent with strongly convex functions

  • 1. Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
  • 2. Faculty of Sciences, University of Novi Sad, Novi Sad, Serbia
  • 3. Carnegie Mellon University, Pittsburgh, PA, USA

Description

Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily bounded) gradient noise satisfying mild technical assumptions, allowing for the dependence of the noise distribution on the current iterate. Under the preceding assumptions, we find an upper large deviations bound for SGD with strongly convex functions. The corresponding rate func- tion captures analytical dependence on the noise distribution and other problem parameters. This is in contrast with conventional mean-square error analysis that captures only the noise dependence through the variance and does not capture the effect of higher order moments nor interplay between the noise geometry and the shape of the cost function. We also derive exact large deviation rates for the case when the objective function is quadratic and show that the obtained function matches the one from the general upper bound hence showing the tightness of the general upper bound. Numerical examples illustrate and corroborate theoretical findings.

Files

Bajovic_etal_LargeDeviationRates.pdf

Files (804.0 kB)

Name Size Download all
md5:33886ed98846f3fb12aa46e56cdb7bd2
804.0 kB Preview Download

Additional details

Funding

MARVEL – Multimodal Extreme Scale Data Analytics for Smart Cities Environments 957337
European Commission