The joint bidiagonalization (JBD) method has been used to compute some extreme generalized singular values and vectors of a large regular matrix pair $\{A,L\}$, where we propose three approaches to compute approximate generalized singular values and vectors. We make a numerical analysis of the underlying JBD process and establish relationships between it and two mathematically equivalent Lanczos bidiagonalizations in finite precision. Based on the results of numerical analysis, we investigate the convergence of the approximate generalized singular values and vectors of $\{A,L\}$. The results show that, under some mild conditions, the semiorthogonality of Lanczos type vectors suffices to deliver approximate generalized singular values with the same accuracy as the full orthogonality does, meaning that it is only necessary to seek for efficient semiorthogonalization strategies for the JBD process. We also establish a sharp bound for the residual norm of an approximate generalized singular value and corresponding approximate right generalized singular vectors, which can reliably estimate the residual norm without explicitly computing the approximate right generalized singular vectors before the convergence occurs.
We consider the Cauchy problem for the Helmholtz equation with a domain in R^d, d>2 with N cylindrical outlets to infinity with bounded inclusions in R^{d-1}. Cauchy data are prescribed on the boundary of the bounded domains and the aim is to find solution on the unbounded part of the boundary. In 1989, Kozlov and Maz'ya proposed an alternating iterative method for solving Cauchy problems associated with elliptic,self-adjoint and positive-definite operators in bounded domains. Different variants of this method for solving Cauchy problems associated with Helmholtz-type operators exists. We consider the variant proposed by Mpinganzima et al. for bounded domains and derive the necessary conditions for the convergence of the procedure in unbounded domains. For the numerical implementation, a finite difference method is used to solve the problem in a simple rectangular domain in R^2 that represent a truncated infinite strip. The numerical results shows that by appropriate truncation of the domain and with appropriate choice of the Robin parameters, the Robin-Dirichlet alternating iterative procedure is convergent.
The monotone variational inequality is a central problem in mathematical programming that unifies and generalizes many important settings such as smooth convex optimization, two-player zero-sum games, convex-concave saddle point problems, etc. The extragradient method by Korpelevich [1976] is one of the most popular methods for solving monotone variational inequalities. Despite its long history and intensive attention from the optimization and machine learning community, the following major problem remains open. What is the last-iterate convergence rate of the extragradient method for monotone and Lipschitz variational inequalities with constraints? We resolve this open problem by showing a tight $O\left(\frac{1}{\sqrt{T}}\right)$ last-iterate convergence rate for arbitrary convex feasible sets, which matches the lower bound by Golowich et al. [2020]. Our rate is measured in terms of the standard gap function. The technical core of our result is the monotonicity of a new performance measure -- the tangent residual, which can be viewed as an adaptation of the norm of the operator that takes the local constraints into account. To establish the monotonicity, we develop a new approach that combines the power of the sum-of-squares programming with the low dimensionality of the update rule of the extragradient method. We believe our approach has many additional applications in the analysis of iterative methods.
In this paper we get error bounds for fully discrete approximations of infinite horizon problems via the dynamic programming approach. It is well known that considering a time discretization with a positive step size $h$ an error bound of size $h$ can be proved for the difference between the value function (viscosity solution of the Hamilton-Jacobi-Bellman equation corresponding to the infinite horizon) and the value function of the discrete time problem. However, including also a spatial discretization based on elements of size $k$ an error bound of size $O(k/h)$ can be found in the literature for the error between the value functions of the continuous problem and the fully discrete problem. In this paper we revise the error bound of the fully discrete method and prove, under similar assumptions to those of the time discrete case, that the error of the fully discrete case is in fact $O(h+k)$ which gives first order in time and space for the method. This error bound matches the numerical experiments of many papers in the literature in which the behaviour $1/h$ from the bound $O(k/h)$ have not been observed.
We extend the Deep Galerkin Method (DGM) introduced in Sirignano and Spiliopoulos (2018)} to solve a number of partial differential equations (PDEs) that arise in the context of optimal stochastic control and mean field games. First, we consider PDEs where the function is constrained to be positive and integrate to unity, as is the case with Fokker-Planck equations. Our approach involves reparameterizing the solution as the exponential of a neural network appropriately normalized to ensure both requirements are satisfied. This then gives rise to nonlinear a partial integro-differential equation (PIDE) where the integral appearing in the equation is handled by a novel application of importance sampling. Secondly, we tackle a number of Hamilton-Jacobi-Bellman (HJB) equations that appear in stochastic optimal control problems. The key contribution is that these equations are approached in their unsimplified primal form which includes an optimization problem as part of the equation. We extend the DGM algorithm to solve for the value function and the optimal control \simultaneously by characterizing both as deep neural networks. Training the networks is performed by taking alternating stochastic gradient descent steps for the two functions, a technique inspired by the policy improvement algorithms (PIA).
In this paper we propose a methodology to accelerate the resolution of the so-called "Sorted L-One Penalized Estimation" (SLOPE) problem. Our method leverages the concept of "safe screening", well-studied in the literature for \textit{group-separable} sparsity-inducing norms, and aims at identifying the zeros in the solution of SLOPE. More specifically, we derive a set of \(\tfrac{n(n+1)}{2}\) inequalities for each element of the \(n\)-dimensional primal vector and prove that the latter can be safely screened if some subsets of these inequalities are verified. We propose moreover an efficient algorithm to jointly apply the proposed procedure to all the primal variables. Our procedure has a complexity \(\mathcal{O}(n\log n + LT)\) where \(T\leq n\) is a problem-dependent constant and \(L\) is the number of zeros identified by the tests. Numerical experiments confirm that, for a prescribed computational budget, the proposed methodology leads to significant improvements of the solving precision.
SVD (singular value decomposition) is one of the basic tools of machine learning, allowing to optimize basis for a given matrix. However, sometimes we have a set of matrices $\{A_k\}_k$ instead, and would like to optimize a single common basis for them: find orthogonal matrices $U$, $V$, such that $\{U^T A_k V\}$ set of matrices is somehow simpler. For example DCT-II is orthonormal basis of functions commonly used in image/video compression - as discussed here, this kind of basis can be quickly automatically optimized for a given dataset. While also discussed gradient descent optimization might be computationally costly, there is proposed CSVD (common SVD): fast general approach based on SVD. Specifically, we choose $U$ as built of eigenvectors of $\sum_i (w_k)^q (A_k A_k^T)^p$ and $V$ of $\sum_k (w_k)^q (A_k^T A_k)^p$, where $w_k$ are their weights, $p,q>0$ are some chosen powers e.g. 1/2, optionally with normalization e.g. $A \to A - rc^T$ where $r_i=\sum_j A_{ij}, c_j =\sum_i A_{ij}$.
There has been an arising trend of adopting deep learning methods to study partial differential equations (PDEs). This article is to propose a Deep Learning Galerkin Method (DGM) for the closed-loop geothermal system, which is a new coupled multi-physics PDEs and mainly consists of a framework of underground heat exchange pipelines to extract the geothermal heat from the geothermal reservoir. This method is a natural combination of Galerkin Method and machine learning with the solution approximated by a neural network instead of a linear combination of basis functions. We train the neural network by randomly sampling the spatiotemporal points and minimize loss function to satisfy the differential operators, initial condition, boundary and interface conditions. Moreover, the approximate ability of the neural network is proved by the convergence of the loss function and the convergence of the neural network to the exact solution in L^2 norm under certain conditions. Finally, some numerical examples are carried out to demonstrate the approximation ability of the neural networks intuitively.
Multigrid is a powerful solver for large-scale linear systems arising from discretized partial differential equations. The convergence theory of multigrid methods for symmetric positive definite problems has been well developed over the past decades, while, for nonsymmetric problems, such theory is still not mature. As a foundation for multigrid analysis, two-grid convergence theory plays an important role in motivating multigrid algorithms. Regarding two-grid methods for nonsymmetric problems, most previous works focus on the spectral radius of iteration matrix or rely on convergence measures that are typically difficult to compute in practice. Moreover, the existing results are confined to two-grid methods with exact solution of the coarse-grid system. In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from the discretizations of convection-diffusion equations). In the case of exact coarse solver, we establish an elegant identity for characterizing two-grid convergence factor, which is measured by a smoother-induced norm. The identity can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by restriction. More generally, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered.
In the storied Colonel Blotto game, two colonels allocate $a$ and $b$ troops, respectively, to $k$ distinct battlefields. A colonel wins a battle if they assign more troops to that particular battle, and each colonel seeks to maximize their total number of victories. Despite the problem's formulation in 1921, the first polynomial-time algorithm to compute Nash equilibrium (NE) strategies for this game was discovered only quite recently. In 2016, \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} formulated a breakthrough algorithm to compute NE strategies for the Colonel Blotto game\footnote{To the best of our knowledge, the algorithm from \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} has computational complexity $O(k^{14}\max\{a,b\}^{13})$}, receiving substantial media coverage (e.g. \citep{Insider}, \citep{NSF}, \citep{ScienceDaily}). In this work, we present the first known $\epsilon$-approximation algorithm to compute NE strategies in the two-player Colonel Blotto game in runtime $\widetilde{O}(\epsilon^{-4} k^8 \max\{a,b\}^2)$ for arbitrary settings of these parameters. Moreover, this algorithm computes approximate coarse correlated equilibrium strategies in the multiplayer (continuous and discrete) Colonel Blotto game (when there are $\ell > 2$ colonels) with runtime $\widetilde{O}(\ell \epsilon^{-4} k^8 n^2 + \ell^2 \epsilon^{-2} k^3 n (n+k))$, where $n$ is the maximum troop count. Before this work, no polynomial-time algorithm was known to compute exact or approximate equilibrium (in any sense) strategies for multiplayer Colonel Blotto with arbitrary parameters. Our algorithm computes these approximate equilibria by a novel (to the author's knowledge) sampling technique with which we implicitly perform multiplicative weights update over the exponentially many strategies available to each player.
We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.