In this paper we analyze the Schwarz alternating method for unconstrained elliptic optimal control problems. We discuss the convergence properties of the method in the continuous case first and then apply the arguments to the finite difference discretization case. In both cases, we prove that the Schwarz alternating method is convergent if its counterpart for an elliptic equation is convergent. Meanwhile, the convergence rate of the method for the elliptic equation under the maximum norm also gives a uniform upper bound (with respect to the regularization parameter $\alpha$) of the convergence rate of the method for the optimal control problem under the maximum norm of proper error merit functions in the continuous case or vectors in the discrete case. Our numerical results corroborate our theoretical results and show that with $\alpha$ decreasing to zero, the method will converge faster. We also give some exposition of this phenomenon.
We introduce and analyze various Regularized Combined Field Integral Equations (CFIER) formulations of time-harmonic Navier equations in media with piece-wise constant material properties. These formulations can be derived systematically starting from suitable coercive approximations of Dirichlet-to-Neumann operators (DtN), and we present a periodic pseudodifferential calculus framework within which the well posedness of CIER formulations can be established. We also use the DtN approximations to derive and analyze Optimized Schwarz (OS) methods for the solution of elastodynamics transmission problems. The pseudodifferential calculus we develop in this paper relies on careful singularity splittings of the kernels of Navier boundary integral operators which is also the basis of high-order Nystr\"om quadratures for their discretizations. Based on these high-order discretizations we investigate the rate of convergence of iterative solvers applied to CFIER and OS formulations of scattering and transmission problems. We present a variety of numerical results that illustrate that the CFIER methodology leads to important computational savings over the classical CFIE one, whenever iterative solvers are used for the solution of the ensuing discretized boundary integral equations. Finally, we show that the OS methods are competitive in the high-frequency high-contrast regime.
The monotone variational inequality is a central problem in mathematical programming that unifies and generalizes many important settings such as smooth convex optimization, two-player zero-sum games, convex-concave saddle point problems, etc. The extragradient method by Korpelevich [1976] is one of the most popular methods for solving monotone variational inequalities. Despite its long history and intensive attention from the optimization and machine learning community, the following major problem remains open. What is the last-iterate convergence rate of the extragradient method for monotone and Lipschitz variational inequalities with constraints? We resolve this open problem by showing a tight $O\left(\frac{1}{\sqrt{T}}\right)$ last-iterate convergence rate for arbitrary convex feasible sets, which matches the lower bound by Golowich et al. [2020]. Our rate is measured in terms of the standard gap function. The technical core of our result is the monotonicity of a new performance measure -- the tangent residual, which can be viewed as an adaptation of the norm of the operator that takes the local constraints into account. To establish the monotonicity, we develop a new approach that combines the power of the sum-of-squares programming with the low dimensionality of the update rule of the extragradient method. We believe our approach has many additional applications in the analysis of iterative methods.
We employ kernel-based approaches that use samples from a probability distribution to approximate a Kolmogorov operator on a manifold. The self-tuning variable-bandwidth kernel method [Berry & Harlim, Appl. Comput. Harmon. Anal., 40(1):68--96, 2016] computes a large, sparse matrix that approximates the differential operator. Here, we use the eigendecomposition of the discretization to (i) invert the operator, solving a differential equation, and (ii) represent gradient vector fields on the manifold. These methods only require samples from the underlying distribution and, therefore, can be applied in high dimensions or on geometrically complex manifolds when spatial discretizations are not available. We also employ an efficient $k$-$d$ tree algorithm to compute the sparse kernel matrix, which is a computational bottleneck.
In this paper we get error bounds for fully discrete approximations of infinite horizon problems via the dynamic programming approach. It is well known that considering a time discretization with a positive step size $h$ an error bound of size $h$ can be proved for the difference between the value function (viscosity solution of the Hamilton-Jacobi-Bellman equation corresponding to the infinite horizon) and the value function of the discrete time problem. However, including also a spatial discretization based on elements of size $k$ an error bound of size $O(k/h)$ can be found in the literature for the error between the value functions of the continuous problem and the fully discrete problem. In this paper we revise the error bound of the fully discrete method and prove, under similar assumptions to those of the time discrete case, that the error of the fully discrete case is in fact $O(h+k)$ which gives first order in time and space for the method. This error bound matches the numerical experiments of many papers in the literature in which the behaviour $1/h$ from the bound $O(k/h)$ have not been observed.
We extend the Deep Galerkin Method (DGM) introduced in Sirignano and Spiliopoulos (2018)} to solve a number of partial differential equations (PDEs) that arise in the context of optimal stochastic control and mean field games. First, we consider PDEs where the function is constrained to be positive and integrate to unity, as is the case with Fokker-Planck equations. Our approach involves reparameterizing the solution as the exponential of a neural network appropriately normalized to ensure both requirements are satisfied. This then gives rise to nonlinear a partial integro-differential equation (PIDE) where the integral appearing in the equation is handled by a novel application of importance sampling. Secondly, we tackle a number of Hamilton-Jacobi-Bellman (HJB) equations that appear in stochastic optimal control problems. The key contribution is that these equations are approached in their unsimplified primal form which includes an optimization problem as part of the equation. We extend the DGM algorithm to solve for the value function and the optimal control \simultaneously by characterizing both as deep neural networks. Training the networks is performed by taking alternating stochastic gradient descent steps for the two functions, a technique inspired by the policy improvement algorithms (PIA).
Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ - similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD.
In this paper, we consider the constrained energy minimizing generalized multiscale finite element method (CEM-GMsFEM) with discontinuous Galerkin (DG) coupling for the linear elasticity equations in highly heterogeneous and high contrast media. We will introduce the construction of a DG version of the CEM-GMsFEM, such as auxiliary basis functions and offline basis functions. The DG version of the method offers some advantages such as flexibility in coarse grid construction and sparsity of resulting discrete systems. Moreover, to our best knowledge, this is the first time where the proof of the convergence of the CEM-GMsFEM in the DG form is given. Some numerical examples will be presented to illustrate the performance of the method.
We study a class of enriched unfitted finite element or generalized finite element methods (GFEM) to solve a larger class of interface problems, that is, 1D elliptic interface problems with discontinuous solutions, including those having implicit or Robin-type interface jump conditions. The major challenge of GFEM development is to construct enrichment functions that capture the imposed discontinuity of the solution while keeping the condition number from fast growth. The linear stable generalized finite element method (SGFEM) was recently developed using one enrichment function. We generalized it to an arbitrary degree using two simple discontinuous one-sided enrichment functions. Optimal order convergence in the $L^2$ and broken $H^1$-norms are established. So is the optimal order convergence at all nodes. To prove the efficiency of the SGFEM, the enriched linear, quadratic, and cubic elements are applied to a multi-layer wall model for drug-eluting stents in which zero-flux jump conditions and implicit concentration interface conditions are both present.
The numerical solution of singular eigenvalue problems is complicated by the fact that small perturbations of the coefficients may have an arbitrarily bad effect on eigenvalue accuracy. However, it has been known for a long time that such perturbations are exceptional and standard eigenvalue solvers, such as the QZ algorithm, tend to yield good accuracy despite the inevitable presence of roundoff error. Recently, Lotz and Noferini quantified this phenomenon by introducing the concept of $\delta$-weak eigenvalue condition numbers. In this work, we consider singular quadratic eigenvalue problems and two popular linearizations. Our results show that a correctly chosen linearization increases $\delta$-weak eigenvalue condition numbers only marginally, justifying the use of these linearizations in numerical solvers also in the singular case. We propose a very simple but often effective algorithm for computing well-conditioned eigenvalues of a singular quadratic eigenvalue problems by adding small random perturbations to the coefficients. We prove that the eigenvalue condition number is, with high probability, a reliable criterion for detecting and excluding spurious eigenvalues created from the singular part.
We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.