About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
NeurIPS 2016
Conference paper
Generalization of ERM in stochastic convex optimization: The dimension Strikes Back
Abstract
In stochastic convex optimization the goal is to minimize a convex function F(x) = Ef∼D[f(x)] over a convex set K ⊂ ℝd where D is some unknown distribution and each f(·) in the support of D is convex over K. The optimization is commonly based on i.i.d. samples f1, f2,⋯, fn from D. A standard approach to such problems is empirical risk minimization (ERM) that optimizes FS(x) = 1/n Σi≤n fi(x). Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of FS to F over K. We demonstrate that in the standard lp/lq setting of Lipschitz-bounded functions over a K of bounded radius, ERM requires sample size that scales linearly with the dimension d. This nearly matches standard upper bounds and improves on Ω(log d) dependence proved for l2/l2 setting in [18]. In stark contrast, these problems can be solved using dimension-independent number of samples for l2/l2 setting and log d dependence for l1/l∞ setting using other approaches. We further show that our lower bound applies even if the functions in the support of D are smooth and efficiently computable and even if an l1 regularization term is added. Finally, we demonstrate that for a more general class of bounded-range (but not Lipschitz-bounded) stochastic convex programs an infinite gap appears already in dimension 2.