In this paper we develop accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient (LLCG), which is beyond the well-studied class of convex optimization with Lipschitz continuous gradient. In particular, we first consider unconstrained convex optimization with LLCG and propose accelerated proximal gradient (APG) methods for solving it. The proposed APG methods are equipped with a verifiable termination criterion and enjoy an operation complexity of ${\cal O}(\varepsilon^{-1/2}\log \varepsilon^{-1})$ and ${\cal O}(\log \varepsilon^{-1})$ for finding an $\varepsilon$-residual solution of an unconstrained convex and strongly convex optimization problem, respectively. We then consider constrained convex optimization with LLCG and propose an first-order proximal augmented Lagrangian method for solving it by applying one of our proposed APG methods to approximately solve a sequence of proximal augmented Lagrangian subproblems. The resulting method is equipped with a verifiable termination criterion and enjoys an operation complexity of ${\cal O}(\varepsilon^{-1}\log \varepsilon^{-1})$ and ${\cal O}(\varepsilon^{-1/2}\log \varepsilon^{-1})$ for finding an $\varepsilon$-KKT solution of a constrained convex and strongly convex optimization problem, respectively. All the proposed methods in this paper are parameter-free or almost parameter-free except that the knowledge on convexity parameter is required. To the best of our knowledge, no prior studies were conducted to investigate accelerated first-order methods with complexity guarantees for convex optimization with LLCG. All the complexity results obtained in this paper are entirely new.
In this work we are interested in general linear inverse problems where the corresponding forward problem is solved iteratively using fixed point methods. Then one-shot methods, which iterate at the same time on the forward problem solution and on the inverse problem unknown, can be applied. We analyze two variants of the so-called multi-step one-shot methods and establish sufficient conditions on the descent step for their convergence, by studying the eigenvalues of the block matrix of the coupled iterations. Several numerical experiments are provided to illustrate the convergence of these methods in comparison with the classical usual and shifted gradient descent. In particular, we observe that very few inner iterations on the forward problem are enough to guarantee good convergence of the inversion algorithm.
Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.
The present paper addresses the convergence of a first order in time incremental projection scheme for the time-dependent incompressible Navier-Stokes equations to a weak solution, without any assumption of existence or regularity assumptions on the exact solution. We prove the convergence of the approximate solutions obtained by the semi-discrete scheme and a fully discrete scheme using a staggered finite volume scheme on non uniform rectangular meshes. Some first a priori estimates on the approximate solutions yield the existence. Compactness arguments, relying on these estimates, together with some estimates on the translates of the discrete time derivatives, are then developed to obtain convergence (up to the extraction of a subsequence), when the time step tends to zero in the semi-discrete scheme and when the space and time steps tend to zero in the fully discrete scheme; the approximate solutions are thus shown to converge to a limit function which is then shown to be a weak solution to the continuous problem by passing to the limit in these schemes.
We present a mathematical and numerical investigation to the shrinkingdimer saddle dynamics for finding any-index saddle points in the solution landscape. Due to the dimer approximation of Hessian in saddle dynamics, the local Lipschitz assumptions and the strong nonlinearity for the saddle dynamics, it remains challenges for delicate analysis, such as the the boundedness of the solutions and the dimer error. We address these issues to bound the solutions under proper relaxation parameters, based on which we prove the error estimates for numerical discretization to the shrinking-dimer saddle dynamics by matching the dimer length and the time step size. Furthermore, the Richardson extrapolation is employed to obtain a high-order approximation. The inherent reason of requiring the matching of the dimer length and the time step size lies in that the former serves a different mesh size from the later, and thus the proposed numerical method is close to a fully-discrete numerical scheme of some spacetime PDE model with the Hessian in the saddle dynamics and its dimer approximation serving as a "spatial operator" and its discretization, respectively, which in turn indicates the PDE nature of the saddle dynamics.
Much recent research effort has been directed to the development of efficient algorithms for solving minimax problems with theoretical convergence guarantees due to the relevance of these problems to a few emergent applications. In this paper, we propose a unified single-loop alternating gradient projection (AGP) algorithm for solving smooth nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. AGP employs simple gradient projection steps for updating the primal and dual variables alternatively at each iteration. We show that it can find an $\varepsilon$-stationary point of the objective function in $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp. $\mathcal{O}\left( \varepsilon ^{-4} \right)$) iterations under nonconvex-strongly concave (resp. nonconvex-concave) setting. Moreover, its gradient complexity to obtain an $\varepsilon$-stationary point of the objective function is bounded by $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp., $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under the strongly convex-nonconcave (resp., convex-nonconcave) setting. To the best of our knowledge, this is the first time that a simple and unified single-loop algorithm is developed for solving both nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. Moreover, the complexity results for solving the latter (strongly) convex-nonconcave minimax problems have never been obtained before in the literature. Numerical results show the efficiency of the proposed AGP algorithm. Furthermore, we extend the AGP algorithm by presenting a block alternating proximal gradient (BAPG) algorithm for solving more general multi-block nonsmooth nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. We can similarly establish the gradient complexity of the proposed algorithm under these four different settings.
For $S \subseteq \{0,1\}^n$ a Boolean function $f \colon S \to \{-1,1\}$ is a polynomial threshold function (PTF) of degree $d$ and weight $W$ if there is an integer polynomial $p$ of degree $d$ and with sum of absolute coefficients $W$ such that $f(x) = \text{sign } p(x)$ for all $x \in S$. We study representation of decision lists as PTFs over Boolean cube $\{0,1\}^n$ and over Hamming ball $\{0,1\}^{n}_{\leq k}$. As our first result we show that for all $d = O\left( \left( \frac{n}{\log n}\right)^{1/3}\right)$ any decision list over $\{0,1\}^n$ can be represented by a PTF of degree $d$ and weight $2^{O(n/d^2)}$. This improves the result by Klivans and Servedio by a $\log^2 d$ factor in the exponent of the weight. Our bound is tight for all $d = O\left( \left( \frac{n}{\log n}\right)^{1/3}\right)$ due to the matching lower bound by Beigel. For decision lists over a Hamming ball $\{0,1\}^n_{\leq k}$ we show that the upper bound on the weight above can be drastically improved to $n^{O(\sqrt{k})}$ for $d = \Theta(\sqrt{k})$. We also show that similar improvement is not possible for smaller degree by proving the lower bound $W = 2^{\Omega(n/d^2)}$ for all $d = O(\sqrt{k})$.
This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $\epsilon$, in $O(\epsilon^{-2})$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of $O(\epsilon^{-2})$ for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.
Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shannon's mutual information to measure the dependence between the input and the output. Our main result shows that it is indeed possible to replace the mutual information by any strongly convex function of the joint input-output distribution, with the subgaussianity condition on the losses replaced by a bound on an appropriately chosen norm capturing the geometry of the dependence measure. This allows us to derive a range of generalization bounds that are either entirely new or strengthen previously known ones. Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance, which are respectively applicable for heavy-tailed loss distributions and highly smooth loss functions. Our analysis is entirely based on elementary tools from convex analysis by tracking the growth of a potential function associated with the dependence measure and the loss function.
This paper proposes two convergent adaptive mesh-refining algorithms for the hybrid high-order method in convex minimization problems with two-sided p-growth. Examples include the p-Laplacian, an optimal design problem in topology optimization, and the convexified double-well problem. The hybrid high-order method utilizes a gradient reconstruction in the space of piecewise Raviart-Thomas finite element functions without stabilization on triangulations into simplices or in the space of piecewise polynomials with stabilization on polytopal meshes. The main results imply the convergence of the energy and, under further convexity properties, of the approximations of the primal resp. dual variable. Numerical experiments illustrate an efficient approximation of singular minimizers and improved convergence rates for higher polynomial degrees. Computer simulations provide striking numerical evidence that an adopted adaptive HHO algorithm can overcome the Lavrentiev gap phenomenon even with empirical higher convergence rates.
Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.