In this paper, we consider two fundamental symmetric kernels in linear algebra: the Cholesky factorization and the symmetric rank-$k$ update (SYRK), with the classical three nested loops algorithms for these kernels. In addition, we consider a machine model with a fast memory of size $S$ and an unbounded slow memory. In this model, all computations must be performed on operands in fast memory, and the goal is to minimize the amount of communication between slow and fast memories. As the set of computations is fixed by the choice of the algorithm, only the ordering of the computations (the schedule) directly influences the volume of communications.We prove lower bounds of $\frac{1}{3\sqrt{2}}\frac{N^3}{\sqrt{S}}$ for the communication volume of the Cholesky factorization of an $N\times N$ symmetric positive definite matrix, and of $\frac{1}{\sqrt{2}}\frac{N^2M}{\sqrt{S}}$ for the SYRK computation of $\mat{A}\cdot\transpose{\mat{A}}$, where $\mathbf{A}$ is an $N\times M$ matrix. Both bounds improve the best known lower bounds from the literature by a factor $\sqrt{2}$.In addition, we present two out-of-core, sequential algorithms with matching communication volume: \TBS for SYRK, with a volume of $\frac{1}{\sqrt{2}}\frac{N^2M}{\sqrt{S}} + \bigo{NM\log N}$, and \LBC for Cholesky, with a volume of $\frac{1}{3\sqrt{2}}\frac{N^3}{\sqrt{S}} + \bigo{N^{5/2}}$. Both algorithms improve over the best known algorithms from the literature by a factor $\sqrt{2}$, and prove that the leading terms in our lower bounds cannot be improved further. This work shows that the operational intensity of symmetric kernels like SYRK or Cholesky is intrinsically higher (by a factor $\sqrt{2}$) than that of corresponding non-symmetric kernels (GEMM and LU factorization).
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players' actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm Nash-UCRL based on the principle "Optimism-in-Face-of-Uncertainty". Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally efficient. Specifically, we show that Nash-UCRL can provably achieve an $\tilde{O}(dH\sqrt{T})$ regret, where $d$ is the linear function dimension, $H$ is the length of the game and $T$ is the total number of steps in the game. To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.
A determinantal point process (DPP) on a collection of $M$ items is a model, parameterized by a symmetric kernel matrix, that assigns a probability to every subset of those items. Recent work shows that removing the kernel symmetry constraint, yielding nonsymmetric DPPs (NDPPs), can lead to significant predictive performance gains for machine learning applications. However, existing work leaves open the question of scalable NDPP sampling. There is only one known DPP sampling algorithm, based on Cholesky decomposition, that can directly apply to NDPPs as well. Unfortunately, its runtime is cubic in $M$, and thus does not scale to large item collections. In this work, we first note that this algorithm can be transformed into a linear-time one for kernels with low-rank structure. Furthermore, we develop a scalable sublinear-time rejection sampling algorithm by constructing a novel proposal distribution. Additionally, we show that imposing certain structural constraints on the NDPP kernel enables us to bound the rejection rate in a way that depends only on the kernel rank. In our experiments we compare the speed of all of these samplers for a variety of real-world tasks.
We describe a polynomial-time algorithm which, given a graph $G$ with treewidth $t$, approximates the pathwidth of $G$ to within a ratio of $O(t\sqrt{\log t})$. This is the first algorithm to achieve an $f(t)$-approximation for some function $f$. Our approach builds on the following key insight: every graph with large pathwidth has large treewidth or contains a subdivision of a large complete binary tree. Specifically, we show that every graph with pathwidth at least $th+2$ has treewidth at least $t$ or contains a subdivision of a complete binary tree of height $h+1$. The bound $th+2$ is best possible up to a multiplicative constant. This result was motivated by, and implies (with $c=2$), the following conjecture of Kawarabayashi and Rossman (SODA'18): there exists a universal constant $c$ such that every graph with pathwidth $\Omega(k^c)$ has treewidth at least $k$ or contains a subdivision of a complete binary tree of height $k$. Our main technical algorithm takes a graph $G$ and some (not necessarily optimal) tree decomposition of $G$ of width $t'$ in the input, and it computes in polynomial time an integer $h$, a certificate that $G$ has pathwidth at least $h$, and a path decomposition of $G$ of width at most $(t'+1)h+1$. The certificate is closely related to (and implies) the existence of a subdivision of a complete binary tree of height $h$. The approximation algorithm for pathwidth is then obtained by combining this algorithm with the approximation algorithm of Feige, Hajiaghayi, and Lee (STOC'05) for treewidth.
While algorithms for planar graphs have received a lot of attention, few papers have focused on the additional power that one gets from assuming an embedding of the graph is available. While in the classic sequential setting, this assumption gives no additional power (as a planar graph can be embedded in linear time), we show that this is far from being the case in other settings. We assume that the embedding is straight-line, but our methods also generalize to non-straight-line embeddings. Specifically, we focus on sublinear-time computation and massively parallel computation (MPC). Our main technical contribution is a sublinear-time algorithm for computing a relaxed version of an $r$-division. We then show how this can be used to estimate Lipschitz additive graph parameters. This includes, for example, the maximum matching, maximum independent set, or the minimum dominating set. We also show how this can be used to solve some property testing problems with respect to the vertex edit distance. In the second part of our paper, we show an MPC algorithm that computes an $r$-division of the input graph. We show how this can be used to solve various classical graph problems with space per machine of $O(n^{2/3+\epsilon})$ for some $\epsilon>0$, and while performing $O(1)$ rounds. This includes for example approximate shortest paths or the minimum spanning tree. Our results also imply an improved MPC algorithm for Euclidean minimum spanning tree.
We formulate the quadratic eigenvalue problem underlying the mathematical model of a linear vibrational system as an eigenvalue problem of a diagonal-plus-low-rank matrix $A$. The eigenvector matrix of $A$ has a Cauchy-like structure. Optimal viscosities are those for which $trace(X)$ is minimal, where $X$ is the solution of the Lyapunov equation $AX+XA^{*}=GG^{*}$. Here $G$ is a low-rank matrix which depends on the eigenfrequencies that need to be damped. After initial eigenvalue decomposition of linearized problem which requires $O(n^3)$ operations, our algorithm computes optimal viscosities for each choice of external dampers in $O(n^2)$ operations, provided that the number of dampers is small. Hence, the subsequent optimization is order of magnitude faster than in the standard approach which solves Lyapunov equation in each step, thus requiring $O(n^3)$ operations. Our algorithm is based on $O(n^2)$ eigensolver for complex symmetric diagonal-plus-rank-one matrices and fast $O(n^2)$ multiplication of linked Cauchy-like matrices.
Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once $d\geq\log n$, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of $n$ points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to $\mathrm{poly}(d,\log n)$ bits of space by trading off with a $\mathrm{poly}(d,\log n)$ factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for $\ell_\infty$ subspace embeddings with $\mathrm{poly}(d,\log n)$ space and $\mathrm{poly}(d,\log n)$ distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a $\log n$ dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For $\ell_p$ subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using $O(d^2\log n)$ space and $O((d\log n)^{1/2-1/p})$ distortion for $p>2$, whereas previous deterministic algorithms incurred a $\mathrm{poly}(n)$ factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.
Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time. This motivates the problem of maintaining the densest subhypergraph of a weighted hypergraph in a {\em dynamic setting}, where the input keeps changing via a sequence of updates (hyperedge insertions/deletions). Previously, the only known algorithm for this problem was due to Hu et al. [HWC17]. This algorithm worked only on unweighted hypergraphs, and had an approximation ratio of $(1+\epsilon)r^2$ and an update time of $O(\text{poly} (r, \log n))$, where $r$ denotes the maximum rank of the input across all the updates. We obtain a new algorithm for this problem, which works even when the input hypergraph is weighted. Our algorithm has a significantly improved (near-optimal) approximation ratio of $(1+\epsilon)$ that is independent of $r$, and a similar update time of $O(\text{poly} (r, \log n))$. It is the first $(1+\epsilon)$-approximation algorithm even for the special case of weighted simple graphs. To complement our theoretical analysis, we perform experiments with our dynamic algorithm on large-scale, real-world data-sets. Our algorithm significantly outperforms the state of the art [HWC17] both in terms of accuracy and efficiency.
We consider smooth optimization problems with a Hermitian positive semi-definite fixed-rank constraint, where a quotient geometry with three Riemannian metrics $g^i(\cdot, \cdot)$ $(i=1,2,3)$ is used to represent this constraint. By taking the nonlinear conjugate gradient method (CG) as an example, we show that CG on the quotient geometry with metric $g^1$ is equivalent to CG on the factor-based optimization framework, which is often called the Burer--Monteiro approach. We also show that CG on the quotient geometry with metric $g^3$ is equivalent to CG on the commonly-used embedded geometry. We call two CG methods equivalent if they produce an identical sequence of iterates $\{X_k\}$. In addition, we show that if the limit point of the sequence $\{X_k\}$ generated by an algorithm has lower rank, that is $X_k\in \mathbb C^{n\times n}, k = 1, 2, \ldots$ has rank $p$ and the limit point $X_*$ has rank $r < p$, then the condition number of the Riemannian Hessian with metric $g^1$ can be unbounded, but those of the other two metrics stay bounded. Numerical experiments show that the Burer--Monteiro CG method has slower local convergence rate if the limit point has a reduced rank, compared to CG on the quotient geometry under the other two metrics. This slower convergence rate can thus be attributed to the large condition number of the Hessian near a minimizer.
Given a matrix $A$ and vector $b$ with polynomial entries in $d$ real variables $\delta=(\delta_1,\ldots,\delta_d)$ we consider the following notion of feasibility: the pair $(A,b)$ is locally feasible if there exists an open neighborhood $U$ of $0$ such that for every $\delta\in U$ there exists $x$ satisfying $A(\delta)x\ge b(\delta)$ entry-wise. For $d=1$ we construct a polynomial time algorithm for deciding local feasibility. For $d \ge 2$ we show local feasibility is NP-hard. As an application (which was the primary motivation for this work) we give a computer-assisted proof of ergodicity of the following elementary 1D cellular automaton: given the current state $\eta_t \in \{0,1\}^{\mathbb{Z}}$ the next state $\eta_{t+1}(n)$ at each vertex $n\in \mathbb{Z}$ is obtained by $\eta_{t+1}(n)= \text{NAND}\big(\text{BSC}_\delta(\eta_t(n-1)), \text{BSC}_\delta(\eta_t(n))\big)$. Here the binary symmetric channel $\text{BSC}_\delta$ takes a bit as input and flips it with probability $\delta$ (and leaves it unchanged with probability $1-\delta$). We also consider the problem of broadcasting information on the 2D-grid of noisy binary-symmetric channels $\text{BSC}_\delta$, where each node may apply an arbitrary processing function to its input bits. We prove that there exists $\delta_0'>0$ such that for all noise levels $0<\delta<\delta_0'$ it is impossible to broadcast information for any processing function, as conjectured in Makur, Mossel, Polyanskiy (ISIT 2021).
We present a new sublinear time algorithm for approximating the spectral density (eigenvalue distribution) of an $n\times n$ normalized graph adjacency or Laplacian matrix. The algorithm recovers the spectrum up to $\epsilon$ accuracy in the Wasserstein-1 distance in $O(n\cdot \text{poly}(1/\epsilon))$ time given sample access to the graph. This result compliments recent work by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (2018), which obtains a solution with runtime independent of $n$, but exponential in $1/\epsilon$. We conjecture that the trade-off between dimension dependence and accuracy is inherent. Our method is simple and works well experimentally. It is based on a Chebyshev polynomial moment matching method that employees randomized estimators for the matrix trace. We prove that, for any Hermitian $A$, this moment matching method returns an $\epsilon$ approximation to the spectral density using just $O({1}/{\epsilon})$ matrix-vector products with $A$. By leveraging stability properties of the Chebyshev polynomial three-term recurrence, we then prove that the method is amenable to the use of coarse approximate matrix-vector products. Our sublinear time algorithm follows from combining this result with a novel sampling algorithm for approximating matrix-vector products with a normalized graph adjacency matrix. Of independent interest, we show a similar result for the widely used \emph{kernel polynomial method} (KPM), proving that this practical algorithm nearly matches the theoretical guarantees of our moment matching method. Our analysis uses tools from Jackson's seminal work on approximation with positive polynomial kernels.