TY - JOUR
T1 - Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
AU - Gess, Benjamin
AU - Kassing, Sebastian
AU - Rana, Nimit
N1 - This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy.
PY - 2024/12/13
Y1 - 2024/12/13
N2 - We give quantitative estimates for the rate of convergence of Riemannian stochastic gradient descent (RSGD) to Riemannian gradient flow and to a diffusion process, the so-called Riemannian stochastic modified flow (RSMF). Using tools from stochastic differential geometry, we show that, in the small learning rate regime, RSGD can be approximated by the solution to the RSMF driven by an infinite-dimensional Wiener process. The RSMF accounts for the random fluctuations of RSGD and, thereby, increases the order of approximation compared to the deterministic Riemannian gradient flow. The RSGD is built using the concept of a retraction map, that is, a cost-efficient approximation of the exponential map, and we prove quantitative bounds for the weak error of the diffusion approximation under assumptions on the retraction map, the geometry of the manifold, and the random estimators of the gradient.
AB - We give quantitative estimates for the rate of convergence of Riemannian stochastic gradient descent (RSGD) to Riemannian gradient flow and to a diffusion process, the so-called Riemannian stochastic modified flow (RSMF). Using tools from stochastic differential geometry, we show that, in the small learning rate regime, RSGD can be approximated by the solution to the RSMF driven by an infinite-dimensional Wiener process. The RSMF accounts for the random fluctuations of RSGD and, thereby, increases the order of approximation compared to the deterministic Riemannian gradient flow. The RSGD is built using the concept of a retraction map, that is, a cost-efficient approximation of the exponential map, and we prove quantitative bounds for the weak error of the diffusion approximation under assumptions on the retraction map, the geometry of the manifold, and the random estimators of the gradient.
U2 - 10.48550/arXiv.2402.03467
DO - 10.48550/arXiv.2402.03467
M3 - Article
SN - 0363-0129
VL - 62
SP - 3288
EP - 3314
JO - SIAM Journal on Control and Optimization
JF - SIAM Journal on Control and Optimization
IS - 6
ER -