For optimal control problems constrained by a initial-valued parabolic PDE, we have to solve a large scale saddle point algebraic system consisting of considering the discrete space and time points all together. A popular strategy to handle such a system is the Krylov subspace method, for which an efficient preconditioner plays a crucial role. The matching-Schur-complement preconditioner has been extensively studied in literature and the implementation of this preconditioner lies in solving the underlying PDEs twice, sequentially in time. In this paper, we propose a new preconditioner for the Schur complement, which can be used parallel-in-time (PinT) via the so called diagonalization technique. We show that the eigenvalues of the preconditioned matrix are low and upper bounded by positive constants independent of matrix size and the regularization parameter. The uniform boundedness of the eigenvalues leads to an optimal linear convergence rate of conjugate gradient solver for the preconditioned Schur complement system. To the best of our knowledge, it is the first time to have an optimal convergence analysis for a PinT preconditioning technique of the optimal control problem. Numerical results are reported to show that the performance of the proposed preconditioner is robust with respect to the discretization step-sizes and the regularization parameter.
This paper introduces a new robust interior point method analysis for semidefinite programming (SDP). This new robust analysis can be combined with either logarithmic barrier or hybrid barrier. Under this new framework, we can improve the running time of semidefinite programming (SDP) with variable size $n \times n$ and $m$ constraints up to $\epsilon$ accuracy. We show that for the case $m = \Omega(n^2)$, we can solve SDPs in $m^{\omega}$ time. This suggests solving SDP is nearly as fast as solving the linear system with equal number of variables and constraints. This is the first result that tall dense SDP can be solved in the nearly-optimal running time, and it also improves the state-of-the-art SDP solver [Jiang, Kathuria, Lee, Padmanabhan and Song, FOCS 2020]. In addition to our new IPM analysis, we also propose a number of techniques that might be of further interest, such as, maintaining the inverse of a Kronecker product using lazy updates, a general amortization scheme for positive semidefinite matrices.
Motivated by applications to topological data analysis, we give an efficient algorithm for computing a (minimal) presentation of a bigraded $K[x,y]$-module $M$, where $K$ is a field. The algorithm takes as input a short chain complex of free modules $X\xrightarrow{f} Y \xrightarrow{g} Z$ such that $M\cong \ker{g}/\mathrm{im}{f}$. It runs in time $O(|X|^3+|Y|^3+|Z|^3)$ and requires $O(|X|^2+|Y|^2+|Z|^2)$ memory, where $|\cdot |$ denotes the rank. Given the presentation computed by our algorithm, the bigraded Betti numbers of $M$ are readily computed. Our approach is based on a simple matrix reduction algorithm, slight variants of which compute kernels of morphisms between free modules, minimal generating sets, and Gr\"obner bases. Our algorithm for computing minimal presentations has been implemented in RIVET, a software tool for the visualization and analysis of two-parameter persistent homology. In experiments on topological data analysis problems, our implementation outperforms the standard computational commutative algebra packages Singular and Macaulay2 by a wide margin.
In this paper, we propose a simple sparse approximate inverse for triangular matrices (SAIT). Using the Jacobi iteration method, we obtain an expression of the exact inverse of triangular matrix, which is a finite series. The SAIT is constructed based on this series. We apply the SAIT matrices to iterative methods with ILU preconditioners. The two triangular solvers in the ILU preconditioning procedure are replaced by two matrix-vector multiplications, which can be fine-grained parallelized. We test this method by solving some linear systems and eigenvalue problems with preconditioned iterative methods.
The sharpest known high probability generalization bounds for uniformly stable algorithms (Feldman, Vondr\'{a}k, 2018, 2019), (Bousquet, Klochkov, Zhivotovskiy, 2020) contain a generally inevitable sampling error term of order $\Theta(1/\sqrt{n})$. When applied to excess risk bounds, this leads to suboptimal results in several standard stochastic convex optimization problems. We show that if the so-called Bernstein condition is satisfied, the term $\Theta(1/\sqrt{n})$ can be avoided, and high probability excess risk bounds of order up to $O(1/n)$ are possible via uniform stability. Using this result, we show a high probability excess risk bound with the rate $O(\log n/n)$ for strongly convex and Lipschitz losses valid for \emph{any} empirical risk minimization method. This resolves a question of Shalev-Shwartz, Shamir, Srebro, and Sridharan (2009). We discuss how $O(\log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.
An adapted deflation preconditioner is employed to accelerate the solution of linear systems resulting from the discretization of fracture mechanics problems with well-conditioned extended/generalized finite elements. The deflation space typically used for linear elasticity problems is enriched with additional vectors, accounting for the enrichment functions used, thus effectively removing low frequency components of the error. To further improve performance, deflation is combined, in a multiplicative way, with a block-Jacobi preconditioner, which removes high frequency components of the error as well as linear dependencies introduced by enrichment. The resulting scheme is tested on a series of non-planar crack propagation problems and compared to alternative linear solvers in terms of performance.
In the framework of inverse linear problems on infinite-dimensional Hilbert space, we prove the convergence of the conjugate gradient iterates to an exact solution to the inverse problem in the most general case where the self-adjoint, non-negative operator is unbounded and with minimal, technically unavoidable assumptions on the initial guess of the iterative algorithm. The convergence is proved to always hold in the Hilbert space norm (error convergence), as well as at other levels of regularity (energy norm, residual, etc.) depending on the regularity of the iterates. We also discuss, both analytically and through a selection of numerical tests, the main features and differences of our convergence result as compared to the case, already available in the literature, where the operator is bounded.
Sparse eigenproblems are important for various applications in computer graphics. The spectrum and eigenfunctions of the Laplace--Beltrami operator, for example, are fundamental for methods in shape analysis and mesh processing. The Subspace Iteration Method is a robust solver for these problems. In practice, however, Lanczos schemes are often faster. In this paper, we introduce the Hierarchical Subspace Iteration Method (HSIM), a novel solver for sparse eigenproblems that operates on a hierarchy of nested vector spaces. The hierarchy is constructed such that on the coarsest space all eigenpairs can be computed with a dense eigensolver. HSIM uses these eigenpairs as initialization and iterates from coarse to fine over the hierarchy. On each level, subspace iterations, initialized with the solution from the previous level, are used to approximate the eigenpairs. This approach substantially reduces the number of iterations needed on the finest grid compared to the non-hierarchical Subspace Iteration Method. Our experiments show that HSIM can solve Laplace--Beltrami eigenproblems on meshes faster than state-of-the-art methods based on Lanczos iterations, preconditioned conjugate gradients and subspace iterations.
We study the problem of estimating the size of maximum matching and minimum vertex cover in sublinear time. Denoting the number of vertices by $n$ and the average degree in the graph by $\bar{d}$, we obtain the following results for both problems: * A multiplicative $(2+\epsilon)$-approximation that takes $\tilde{O}(n/\epsilon^2)$ time using adjacency list queries. * A multiplicative-additive $(2, \epsilon n)$-approximation in $\tilde{O}((\bar{d} + 1)/\epsilon^2)$ time using adjacency list queries. * A multiplicative-additive $(2, \epsilon n)$-approximation in $\tilde{O}(n/\epsilon^{3})$ time using adjacency matrix queries. All three results are provably time-optimal up to polylogarithmic factors culminating a long line of work on these problems. Our main contribution and the key ingredient leading to the bounds above is a new and near-tight analysis of the average query complexity of the randomized greedy maximal matching algorithm which improves upon a seminal result of Yoshida, Yamamoto, and Ito [STOC'09].
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.