The distance matrix of a dataset $X$ of $n$ points with respect to a distance function $f$ represents all pairwise distances between points in $X$ induced by $f$. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class of distance matrices, such as the $\ell_1$ metric for which we get a linear runtime, as well as an $\Omega(n^2)$ lower bound for any algorithm which computes a matrix-vector product for the $\ell_{\infty}$ case, showing a separation between the $\ell_1$ and the $\ell_{\infty}$ metrics. Our upper bound results, in conjunction with recent works on the matrix-vector query model, have many further downstream applications, including the fastest algorithm for computing a relative error low-rank approximation for the distance matrix induced by $\ell_1$ and $\ell_2^2$ functions and the fastest algorithm for computing an additive error low-rank approximation for the $\ell_2$ metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate $\ell_2$ distance matrix in time faster than the bound implied by the Johnson-Lindenstrauss lemma.
Recombination is a fundamental evolutionary force, but it is difficult to quantify because the effect of a recombination event on patterns of variation in a sample of genetic data can be hard to discern. Estimators for the recombination rate, which are usually based on the idea of integrating over the unobserved possible evolutionary histories of a sample, can therefore be noisy. Here we consider a related question: how would an estimator behave if the evolutionary history actually was observed? This would offer an upper bound on the performance of estimators used in practice. In this paper we derive an expression for the maximum likelihood estimator for the recombination rate based on a continuously observed, multi-locus, Wright--Fisher diffusion of haplotype frequencies, complementing existing work for an estimator of selection. We show that, contrary to selection, the estimator has unusual properties because the observed information matrix can explode in finite time whereupon the recombination parameter is learned without error. We also show that the recombination estimator is robust to the presence of selection in the sense that incorporating selection into the model leaves the estimator unchanged. We study the properties of the estimator by simulation and show that its distribution can be quite sensitive to the underlying mutation rates.
Iterative solutions of sparse linear systems and sparse eigenvalue problems have a fundamental role in vital fields of scientific research and engineering. The crucial computing kernel for such iterative solutions is the multiplication of a sparse matrix by a dense vector. Efficient implementation of sparse matrix-vector multiplication (SpMV) and linear solvers are therefore essential and has been subjected to extensive research across a variety of computing architectures and accelerators such as central processing units (CPUs), graphical processing units (GPUs), many integrated cores (MICs), and field programmable gate arrays (FPGAs). Unleashing the full potential of an architecture/accelerator requires determining the factors that affect an efficient implementation of SpMV. This article presents the first of its kind, in-depth survey covering over two hundred state-of-the-art optimization schemes for solving sparse iterative linear systems with a focus on computing SpMV. A new taxonomy for iterative solutions and SpMV techniques common to all architectures is proposed. This article includes reviews of SpMV techniques for all architectures to consolidate a single taxonomy to encourage cross-architectural and heterogeneous-architecture developments. However, the primary focus is on GPUs. The major contributions as well as the primary, secondary, and tertiary contributions of the SpMV techniques are first highlighted utilizing the taxonomy and then qualitatively compared. A summary of the current state of the research for each architecture is discussed separately. Finally, several open problems and key challenges for future research directions are outlined.
Tensor decomposition serves as a powerful primitive in statistics and machine learning. In this paper, we focus on using power iteration to decompose an overcomplete random tensor. Past work studying the properties of tensor power iteration either requires a non-trivial data-independent initialization, or is restricted to the undercomplete regime. Moreover, several papers implicitly suggest that logarithmically many iterations (in terms of the input dimension) are sufficient for the power method to recover one of the tensor components. In this paper, we analyze the dynamics of tensor power iteration from random initialization in the overcomplete regime. Surprisingly, we show that polynomially many steps are necessary for convergence of tensor power iteration to any of the true component, which refutes the previous conjecture. On the other hand, our numerical experiments suggest that tensor power iteration successfully recovers tensor components for a broad range of parameters, despite that it takes at least polynomially many steps to converge. To further complement our empirical evidence, we prove that a popular objective function for tensor decomposition is strictly increasing along the power iteration path. Our proof is based on the Gaussian conditioning technique, which has been applied to analyze the approximate message passing (AMP) algorithm. The major ingredient of our argument is a conditioning lemma that allows us to generalize AMP-type analysis to non-proportional limit and polynomially many iterations of the power method.
Given an $n\times n$ matrix with integer entries in the range $[-h,h]$, how close can two of its distinct eigenvalues be? The best previously known examples have a minimum gap of $h^{-O(n)}$. Here we give an explicit construction of matrices with entries in $[0,h]$ with two eigenvalues separated by at most $h^{-n^2/16+o(n^2)}$. Up to a constant in the exponent, this agrees with the known lower bound of $\Omega((2\sqrt{n})^{-n^2}h^{-n^2})$ \cite{mahler1964inequality}. Bounds on the minimum gap are relevant to the worst case analysis of algorithms for diagonalization and computing canonical forms of integer matrices (e.g. \cite{dey2021bit}). In addition to our explicit construction, we show there are many matrices with a slightly larger gap of roughly $h^{-n^2/32}$. We also construct 0-1 matrices which have two eigenvalues separated by at most $2^{-n^2/64+o(n^2)}$.
In this manuscript, we consider a finite multivariate nonparametric mixture model where the dependence between the marginal densities is modeled using the copula device. Pseudo EM stochastic algorithms were recently proposed to estimate all of the components of this model under a location-scale constraint on the marginals. Here, we introduce a deterministic algorithm that seeks to maximize a smoothed semiparametric likelihood. No location-scale assumption is made about the marginals. The algorithm is monotonic in one special case, and, in another, leads to ``approximate monotonicity'' -- whereby the difference between successive values of the objective function becomes non-negative up to an additive term that becomes negligible after a sufficiently large number of iterations. The behavior of this algorithm is illustrated on several simulated datasets. The results suggest that, under suitable conditions, the proposed algorithm may indeed be monotonic in general. A discussion of the results and some possible future research directions round out our presentation.
To explore the limits of a stochastic gradient method, it may be useful to consider an example consisting of an infinite number of quadratic functions. In this context, it is appropriate to determine the expected value and the covariance matrix of the stochastic noise, i.e. the difference of the true gradient and the approximated gradient generated from a finite sample. When specifying the covariance matrix, the expected value of a quadratic form QBQ is needed, where Q is a Wishart distributed random matrix and B is an arbitrary fixed symmetric matrix. After deriving an expression for E(QBQ) and considering some special cases, a numerical example is used to show how these results can support the comparison of two stochastic methods.
The weighted Euclidean distance between two vectors is a Euclidean distance where the contribution of each dimension is scaled by a given non-negative weight. The Johnson-Lindenstrauss (JL) lemma can be easily adapted to the weighted Euclidean distance if weights are known at construction time. Given a set of $n$ vectors with dimension $d$, it suffices to scale each dimension of the input vectors according to the weights, and then apply any standard JL reduction: the weighted Euclidean distance between pairs of vectors is preserved within a multiplicative factor $\epsilon$ with high probability. However, this is not the case when weights are provided after the dimensionality reduction. In this paper, we show that by applying a linear map from real vectors to a complex vector space, it is possible to update the compressed vectors so that the weighted Euclidean distances between pairs of points can be computed within a multiplicative factor $\epsilon$, even when weights are provided after the dimensionality reduction. Finally, we consider the estimation of weighted Euclidean norms in streaming settings: we show how to estimate the weighted norm when the weights are provided either after or concurrently with the input vector.
Matrix factorization exploits the idea that, in complex high-dimensional data, the actual signal typically lies in lower-dimensional structures. These lower dimensional objects provide useful insight, with interpretability favored by sparse structures. Sparsity, in addition, is beneficial in terms of regularization and, thus, to avoid over-fitting. By exploiting Bayesian shrinkage priors, we devise a computationally convenient approach for high-dimensional matrix factorization. The dependence between row and column entities is modeled by inducing flexible sparse patterns within factors. The availability of external information is accounted for in such a way that structures are allowed while not imposed. Inspired by boosting algorithms, we pair the the proposed approach with a numerical strategy relying on a sequential inclusion and estimation of low-rank contributions, with data-driven stopping rule. Practical advantages of the proposed approach are demonstrated by means of a simulation study and the analysis of soccer heatmaps obtained from new generation tracking data.
In this paper, we revisit the class of iterative shrinkage-thresholding algorithms (ISTA) for solving the linear inverse problem with sparse representation, which arises in signal and image processing. It is shown in the numerical experiment to deblur an image that the convergence behavior in the logarithmic-scale ordinate tends to be linear instead of logarithmic, approximating to be flat. Making meticulous observations, we find that the previous assumption for the smooth part to be convex weakens the least-square model. Specifically, assuming the smooth part to be strongly convex is more reasonable for the least-square model, even though the image matrix is probably ill-conditioned. Furthermore, we improve the pivotal inequality tighter for composite optimization with the smooth part to be strongly convex instead of general convex, which is first found in [Li et al., 2022]. Based on this pivotal inequality, we generalize the linear convergence to composite optimization in both the objective value and the squared proximal subgradient norm. Meanwhile, we set a simple ill-conditioned matrix which is easy to compute the singular values instead of the original blur matrix. The new numerical experiment shows the proximal generalization of Nesterov's accelerated gradient descent (NAG) for the strongly convex function has a faster linear convergence rate than ISTA. Based on the tighter pivotal inequality, we also generalize the faster linear convergence rate to composite optimization, in both the objective value and the squared proximal subgradient norm, by taking advantage of the well-constructed Lyapunov function with a slight modification and the phase-space representation based on the high-resolution differential equation framework from the implicit-velocity scheme.
We show that the max entropy algorithm can be derandomized (with respect to a particular objective function) to give a deterministic $3/2-\epsilon$ approximation algorithm for metric TSP for some $\epsilon > 10^{-36}$. To obtain our result, we apply the method of conditional expectation to an objective function constructed in prior work which was used to certify that the expected cost of the algorithm is at most $3/2-\epsilon$ times the cost of an optimal solution to the subtour elimination LP. The proof in this work involves showing that the expected value of this objective function can be computed in polynomial time (at all stages of the algorithm's execution).