In this paper, a robust and effective preconditioner for the fast Method of Moments(MoM) based Hierarchal Electric Field Integral Equation(EFIE) solver is proposed using symmetric near-field Schur's complement method. In this preconditioner, near-field blocks are scaled to a diagonal block matrix and these near-field blocks are replaced with the scaled diagonal block matrix which reduces the near-field storage memory and the overall matrix vector product time. Scaled diagonal block matrix is further used as a preconditioner and due to the block diagonal form of the final preconditioner, no additional fill-ins are introduced in its inverse. The symmetric property of the near-field blocks is exploited to reduce the preconditioner setup time. Near linear complexity of preconditioner set up and solve times is achieved by near-field block ordering, using graph bandwidth reduction algorithms and compressing the fill-in blocks in preconditioner computation. Preconditioner set up time is reduced to half by using the symmetric property and near-field block ordering. It has been shown using a complexity analysis that the cost of preconditioner construction in terms of computation and memory is linear. Numerical experiments demonstrate an average of 1.5-2.3x speed-up in the iterative solution time over Null-Field based preconditioners.
Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
We consider symmetric positive definite preconditioners for multiple saddle-point systems of block tridiagonal form, which can be applied within the MINRES algorithm. We describe such a preconditioner for which the preconditioned matrix has only two distinct eigenvalues, 1 and -1, when the preconditioner is applied exactly. We discuss the relative merits of such an approach compared to a more widely studied block diagonal preconditioner, specify the computational work associated with applying the new preconditioner inexactly, and survey a number of theoretical results for the block diagonal case. Numerical results validate our theoretical findings.
Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely used second-order optimization methods such as quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton sketch-based stochastic optimization algorithm for large-scale empirical risk minimization. Specifically, we compute a partial column Hessian of size ($d\times m$) with $m\ll d$ randomly selected variables, then use the \emph{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. We then integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradient methods. The results of numerical experiments on both convex and non-convex functions show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method, exhibiting performance competitive with that of state-of-the-art first-order and stochastic quasi-Newton methods. Furthermore, we provide a theoretical convergence analysis for convex functions.
The use of a large excess of service antennas brings a variety of performance benefits to distributed MIMO C-RAN, but the corresponding high fronthaul data loads can be problematic in practical systems with limited fronthaul capacity. In this work we propose the use of lossy dimension reduction, applied locally at each remote radio head (RRH), to reduce this fronthaul traffic. We first consider the uplink, and the case where each RRH applies a linear dimension reduction filter to its multi-antenna received signal vector. It is shown that under a joint mutual information criteria, the optimal dimension reduction filters are given by a variant of the conditional Karhunen-Loeve transform, with a stationary point found using block co-ordinate ascent. These filters are then modified such that each RRH can calculate its own dimension reduction filter in a decentralised manner, using knowledge only of its own instantaneous channel and network slow fading coefficients. We then show that in TDD systems these dimension reduction filters can be re-used as part of a two-stage reduced dimension downlink precoding scheme. Analysis and numerical results demonstrate that the proposed approach can significantly reduce both uplink and downlink fronthaul traffic whilst incurring very little loss in MIMO performance.
Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.
We call a multigraph $(k,d)$-edge colourable if its edge set can be partitioned into $k$ subgraphs of maximum degree at most $d$ and denote as $\chi'_{d}(G)$ the minimum $k$ such that $G$ is $(k,d)$-edge colourable. We prove that for every integer $d$, every multigraph $G$ with maximum degree $\Delta$ is $(\lceil \frac{\Delta}{d} \rceil, d)$-edge colourable if $d$ is even and $(\lceil \frac{3\Delta - 1}{3d - 1} \rceil, d)$-edge colourable if $d$ is odd and these bounds are tight. We also prove that for every simple graph $G$, $\chi'_{d}(G) \in \{ \lceil \frac{\Delta}{d} \rceil, \lceil \frac{\Delta+1}{d} \rceil \}$ and characterize the values of $d$ and $\Delta$ for which it is NP-complete to compute $\chi'_d(G)$. These results generalize several classic results on the chromatic index of a graph by Shannon, Vizing, Holyer, Leven and Galil.
Kernel matrices, which arise from discretizing a kernel function $k(x,x')$, have a variety of applications in mathematics and engineering. Classically, the celebrated fast multipole method was designed to perform matrix multiplication on kernel matrices of dimension $N$ in time almost linear in $N$ by using techniques later generalized into the linear algebraic framework of hierarchical matrices. In light of this success, we propose a quantum algorithm for efficiently performing matrix operations on hierarchical matrices by implementing a quantum block-encoding of the hierarchical matrix structure. When applied to many kernel matrices, our quantum algorithm can solve quantum linear systems of dimension $N$ in time $O(\kappa \operatorname{polylog}(\frac{N}{\varepsilon}))$, where $\kappa$ and $\varepsilon$ are the condition number and error bound of the matrix operation. This runtime is exponentially faster than any existing quantum algorithms for implementing dense kernel matrices. Finally, we discuss possible applications of our methodology in solving integral equations or accelerating computations in N-body problems.
This paper deals with solving a class of three-by-three block saddle point problems. The systems are solved by preconditioning techniques. Based on an iterative method, we construct a block upper triangular preconditioner. The convergence of the presented method is studied in details. Finally, some numerical experiments are given to demonstrate the superiority of the proposed preconditioner over some existing ones.
Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remedy this issue, we propose Geometric Graph Convolutional Network (geo-GCN) which uses spatial features to efficiently learn from graphs that can be naturally located in space. Our contribution is threefold: we propose a GCN-inspired architecture which (i) leverages node positions, (ii) is a proper generalisation of both GCNs and Convolutional Neural Networks (CNNs), (iii) benefits from augmentation which further improves the performance and assures invariance with respect to the desired properties. Empirically, geo-GCN outperforms state-of-the-art graph-based methods on image classification and chemical tasks.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.