We exhibit a randomized algorithm which given a matrix $A\in \mathbb{C}^{n\times n}$ with $\|A\|\le 1$ and $\delta>0$, computes with high probability an invertible $V$ and diagonal $D$ such that $\|A-VDV^{-1}\|\le \delta$ using $O(T_{MM}(n)\log^2(n/\delta))$ arithmetic operations, in finite arithmetic with $O(\log^4(n/\delta)\log n)$ bits of precision. Here $T_{MM}(n)$ is the number of arithmetic operations required to multiply two $n\times n$ complex matrices numerically stably, known to satisfy $T_{MM}(n)=O(n^{\omega+\eta})$ for every $\eta>0$ where $\omega$ is the exponent of matrix multiplication (Demmel et al., Numer. Math., 2007). Our result significantly improves the previously best known provable running times of $O(n^{10}/\delta^2)$ arithmetic operations for diagonalization of general matrices (Armentano et al., J. Eur. Math. Soc., 2018), and (with regards to the dependence on $n$) $O(n^3)$ arithmetic operations for Hermitian matrices (Dekker and Traub, Lin. Alg. Appl., 1971). It is the first algorithm to achieve nearly matrix multiplication time for diagonalization in any model of computation (real arithmetic, rational arithmetic, or finite arithmetic), thereby matching the complexity of other dense linear algebra operations such as inversion and $QR$ factorization up to polylogarithmic factors. The proof rests on two new ingredients. (1) We show that adding a small complex Gaussian perturbation to any matrix splits its pseudospectrum into $n$ small well-separated components. In particular, this implies that the eigenvalues of the perturbed matrix have a large minimum gap, a property of independent interest in random matrix theory. (2) We give a rigorous analysis of Roberts' Newton iteration method (Roberts, Int. J. Control, 1980) for computing the sign function of a matrix in finite arithmetic, itself an open problem in numerical analysis since at least 1986.
A support vector machine (SVM) is an algorithm that finds a hyperplane which optimally separates labeled data points in $\mathbb{R}^n$ into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We connect the possible configurations of support vectors to Radon's theorem, which provides guarantees for when a set of points can be divided into two classes (positive and negative) whose convex hulls intersect. If the convex hulls of the positive and negative support vectors are projected onto a separating hyperplane, then the projections intersect if and only if the hyperplane is optimal. Further, with a particular type of general position, we show that (a) the projected convex hulls of the support vectors intersect in exactly one point, (b) the support vectors are stable under perturbation, (c) there are at most $n+1$ support vectors, and (d) every number of support vectors from 2 up to $n+1$ is possible. Finally, we perform computer simulations studying the expected number of support vectors, and their configurations, for randomly generated data. We observe that as the distance between classes of points increases for this type of randomly generated data, configurations with fewer support vectors become more likely.
This paper considers the basic question of how strong of a probabilistic guarantee can a hash table, storing $n$ $(1 + \Theta(1)) \log n$-bit key/value pairs, offer? Past work on this question has been bottlenecked by limitations of the known families of hash functions: The only hash tables to achieve failure probabilities less than $1 / 2^{\polylog n}$ require access to fully-random hash functions -- if the same hash tables are implemented using the known explicit families of hash functions, their failure probabilities become $1 / \poly(n)$. To get around these obstacles, we show how to construct a randomized data structure that has the same guarantees as a hash table, but that \emph{avoids the direct use of hash functions}. Building on this, we are able to construct a hash table using $O(n)$ random bits that achieves failure probability $1 / n^{n^{1 - \epsilon}}$ for an arbitrary positive constant $\epsilon$. In fact, we show that this guarantee can even be achieved by a \emph{succinct dictionary}, that is, by a dictionary that uses space within a $1 + o(1)$ factor of the information-theoretic optimum. Finally we also construct a succinct hash table whose probabilistic guarantees fall on a different extreme, offering a failure probability of $1 / \poly(n)$ while using only $\tilde{O}(\log n)$ random bits. This latter result matches (up to low-order terms) a guarantee previously achieved by Dietzfelbinger et al., but with increased space efficiency and with several surprising technical components.
We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization. The model-free setting is considered, with minimal assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization sequentially recovers the principal components of the observed matrix. Consequently, when equipped with proper early stopping, gradient descent produces the best low-rank approximation of the observed matrix without explicit regularization. We provide a sharp characterization of the relationship between the approximation error, iteration complexity, initialization size and stepsize. Our complexity bound is almost dimension-free and depends logarithmically on the approximation error, with significantly more lenient requirements on the stepsize and initialization compared to prior work. Our theoretical results provide accurate prediction for the behavior gradient descent, showing good agreement with numerical experiments.
We introduce a two-person non-zero-sum random-turn game that is a variant of the stake-governed games introduced recently in [HP2022]. We call the new game the Trail of Lost Pennies. At time zero, a counter is placed at a given integer location: $X_0 = k \in \mathbb{Z}$, say. At the $i$-th turn (for $i \in \mathbb{N}_+$), Maxine and Mina place non-negative stakes, $a_i$ and $b_i$, for which each pays from her own savings. Maxine is declared to be the turn victor with probability $\tfrac{a_i}{a_i+b_i}$; otherwise, Mina is. If Maxine wins the turn, she will move the counter one place to the right, so that $X_i = X_{i-1} +1$; if Mina does so, the counter will move one place to the left, so that $X_i = X_{i-1} -1$. If $\liminf X_i = \infty$, then Maxine wins the game; if $\limsup X_i = -\infty$, then Mina does. (A special rule is needed to treat the remaining, indeterminate, case.) When Maxine wins, she receives a terminal payment of $m_\infty$, while Mina receives $n_\infty$. If Mina wins, these respective receipts are $m_{-\infty}$ and $n_{-\infty}$. The four terminal payment values are supposed to be real numbers that satisfy $m_\infty > m_{-\infty}$ and $n_\infty < n_{-\infty}$, where these bounds accord with the notion that Maxine wins when the counter ends far to the right, and that Mina does so when it reaches far to the left. Each player is motivated to offer stakes at each turn of the game, in order to secure the higher terminal payment that will arise from her victory; but since these stake amounts accumulate to act as a cost depleting the profit arising from victory, each player must also seek to control these expenses. In this article, we study the Trail of Lost Pennies, formulating strategies for the two players and defining and analysing Nash equilibria in the game.
The precision matrix that encodes conditional linear dependency relations among a set of variables forms an important object of interest in multivariate analysis. Sparse estimation procedures for precision matrices such as the graphical lasso (Glasso) gained popularity as they facilitate interpretability, thereby separating pairs of variables that are conditionally dependent from those that are independent (given all other variables). Glasso lacks, however, robustness to outliers. To overcome this problem, one typically applies a robust plug-in procedure where the Glasso is computed from a robust covariance estimate instead of the sample covariance, thereby providing protection against outliers. In this paper, we study such estimators theoretically, by deriving and comparing their influence function, sensitivity curves and asymptotic variances.
This paper focuses on efficient landmark management in radar based simultaneous localization and mapping (SLAM). Landmark management is necessary in order to maintain a consistent map of the estimated landmarks relative to the estimate of the platform's pose. This task is particularly important when faced with multiple detections from the same landmark and/or dynamic environments where the location of a landmark can change. A further challenge with radar data is the presence of false detections. Accordingly, we propose a simple yet efficient rule based solution for radar SLAM landmark management. Assuming a low-dynamic environment, there are several steps in our solution: new landmarks need to be detected and included, false landmarks need to be identified and removed, and the consistency of the landmarks registered in the map needs to be maintained. To illustrate our solution, we run an extended Kalman filter SLAM algorithm in an environment containing both stationary and temporally stationary landmarks. Our simulation results demonstrate that the proposed solution is capable of reliably managing landmarks even when faced with false detections and multiple detections from the same landmark.
Spectral embedding finds vector representations of the nodes of a network, based on the eigenvectors of its adjacency or Laplacian matrix, and has found applications throughout the sciences. Many such networks are multipartite, meaning their nodes can be divided into groups and nodes of the same group are never connected. When the network is multipartite, this paper demonstrates that the node representations obtained via spectral embedding live near group-specific low-dimensional subspaces of a higher-dimensional ambient space. For this reason we propose a follow-on step after spectral embedding, to recover node representations in their intrinsic rather than ambient dimension, proving uniform consistency under a low-rank, inhomogeneous random graph model. Our method naturally generalizes bipartite spectral embedding, in which node representations are obtained by singular value decomposition of the biadjacency or bi-Laplacian matrix.
Sparse matrices and linear algebra are at the heart of scientific simulations. More than 70 sparse matrix storage formats have been developed over the years, targeting a wide range of hardware architectures and matrix types. Each format is developed to exploit the particular strengths of an architecture, or the specific sparsity patterns of matrices, and the choice of the right format can be crucial in order to achieve optimal performance. The adoption of dynamic sparse matrices that can change the underlying data-structure to match the computation at runtime without introducing prohibitive overheads has the potential of optimizing performance through dynamic format selection. In this paper, we introduce Morpheus, a library that provides an efficient abstraction for dynamic sparse matrices. The adoption of dynamic matrices aims to improve the productivity of developers and end-users who do not need to know and understand the implementation specifics of the different formats available, but still want to take advantage of the optimization opportunity to improve the performance of their applications. We demonstrate that by porting HPCG to use Morpheus, and without further code changes, 1) HPCG can now target heterogeneous environments and 2) the performance of the SpMV kernel is improved up to 2.5x and 7x on CPUs and GPUs respectively, through runtime selection of the best format on each MPI process.
Let $L$ be an $n\times n$ array whose top left $r\times r$ subarray is filled with $k$ different symbols, each occurring at most once in each row and at most once in each column. We establish necessary and sufficient conditions that ensure the remaining cells of $L$ can be filled such that each symbol occurs at most once in each row and at most once in each column, $L$ is symmetric with respect to the main diagonal, and each symbol occurs a prescribed number of times in $L$. The case where the prescribed number of times each symbol occurs is $n$ was solved by Cruse (J. Combin. Theory Ser. A 16 (1974), 18-22), and the case where the top left subarray is $r\times n$ and the symmetry is not required, was settled by Goldwasser et al. (J. Combin. Theory Ser. A 130 (2015), 26-41). Our result allows the entries of the main diagonal to be specified as well, which leads to an extension of the Andersen-Hoffman's Theorem (Annals of Disc. Math. 15 (1982) 9-26, European J. Combin. 4 (1983) 33-35).
Modern neural network training relies heavily on data augmentation for improved generalization. After the initial success of label-preserving augmentations, there has been a recent surge of interest in label-perturbing approaches, which combine features and labels across training samples to smooth the learned decision surface. In this paper, we propose a new augmentation method that leverages the first and second moments extracted and re-injected by feature normalization. We replace the moments of the learned features of one training image by those of another, and also interpolate the target labels. As our approach is fast, operates entirely in feature space, and mixes different signals than prior methods, one can effectively combine it with existing augmentation methods. We demonstrate its efficacy across benchmark data sets in computer vision, speech, and natural language processing, where it consistently improves the generalization performance of highly competitive baseline networks.