TR2024-131

From Convexity to Strong Convexity and Beyond: Bridging The Gap In Convergence Rates


    •  Romero, O., Benosman, M., Pappas, G., "From Convexity to Strong Convexity and Beyond: Bridging The Gap In Convergence Rates", IEEE Conference on Decision and Control (CDC), September 2024.
      BibTeX TR2024-131 PDF
      • @inproceedings{Romero2024sep,
      • author = {Romero, Orlando and Benosman, Mouhacine and Pappas, George}},
      • title = {From Convexity to Strong Convexity and Beyond: Bridging The Gap In Convergence Rates},
      • booktitle = {IEEE Conference on Decision and Control (CDC)},
      • year = 2024,
      • month = sep,
      • url = {https://www.merl.com/publications/TR2024-131}
      • }
  • Research Areas:

    Control, Dynamical Systems, Optimization, Signal Processing

Abstract:

In this paper, we re-examine the role of convexity and smoothness on gradient-based unconstrained optimization. While existing literature establishes the fundamental limits for gradient-based optimization algorithms for the class FL of L-smooth convex functions and the subclass Fm,L of L-smooth and m-strongly convex functions, there is a notable gap in the stark transition from their respective sublinear and linear/exponential convergence rates that persists even as m - 0. This gap is notable since the classical rate of 0(1/k) for gradient descent in FL is often overly conservative compared to what is observed in practice for convex functions that are not strongly convex. In this work, we partially close the aforementioned gap by leveraging the notion of uniform smoothness and convexity, and their respective moduli, to quantify and more comprehensively characterize the smoothness and convexity of a given function. We show how, through a simple rescaling of gradient descent informed by the modulus of smoothness, we can recover the classic rates as edge cases and establish novel rates for a wide variety of functions. Further, we examine how uniform convexity can be replaced with the Kurdyka- Lojasiewicz inequality, with the so-called “desingularizing function” replacing the role of the modulus of convexity in the novel rates. This characterization yields novel geometric insights on the relationship between the optimization landscape and the attainable convergence rates.