We consider sparse matrix estimation where the goal is to estimate an $n\times n$ matrix from noisy observations of a small subset of its entries. We analyze the estimation error of the popularly utilized collaborative filtering algorithm for the sparse regime. Specifically, we propose a novel iterative variant of the algorithm, adapted to handle the setting of sparse observations. We establish that as long as the fraction of entries observed at random scale as $\frac{\log^{1+\kappa}(n)}{n}$ for any fixed $\kappa > 0$, the estimation error with respect to the $\max$-norm decays to $0$ as $n\to\infty$ assuming the underlying matrix of interest has constant rank $r$. Our result is robust to model mis-specification in that if the underlying matrix is approximately rank $r$, then the estimation error decays to the approximate error with respect to the $\max$-norm. In the process, we establish algorithm's ability to handle arbitrary bounded noise in the observations.
We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities. We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions. This raises major questions about the usefulness of these models in deep learning. We further show that theoretical results, such as the layered nature of critical points, are strongly dependent on the use of the exact form of these limiting spectral densities. We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra. They also give large discrete spectral peaks at the origin, providing a theoretical explanation for the observation that various optima can be connected by one dimensional of low loss values. We further show that, in the case of a random matrix product, the weight of the discrete spectral component at $0$ depends on the ratio of the dimensions of the weight matrices.
We propose an approach to the estimation of infinite sets of random vectors. The problem addressed is as follows. Given two infinite sets of random vectors, find a single estimator that estimates vectors from with a controlled associated error. A new theory for the existence and implementation of such an estimator is studied. In particular, we show that the proposed estimator is asymptotically optimal. Moreover, the estimator is determined in terms of pseudo-inverse matrices and, therefore, it always exists.
Tensor-based modulation (TBM) has been proposed in the context of unsourced random access for massive uplink communication. In this modulation, transmitters encode data as rank-1 tensors, with factors from a discrete vector constellation. This construction allows to split the multi-user receiver into a user separation step based on a low-rank tensor decomposition, and independent single-user demappers. In this paper, we analyze the limits of the tensor decomposition using Cram\'er-Rao bounds, providing bounds on the accuracy of the estimated factors. These bounds are shown by simulation to be tight at high SNR. We introduce an approximate perturbation model for the output of the tensor decomposition, which facilitates the computation of the log-likelihood ratios (LLR) of the transmitted bits, and provides an approximate achievable bound for the finite-length error probability. Combining TBM with classical forward error correction coding schemes such as polar codes, we use the approximate LLR to derive soft-decision decoder showing a gain over hard-decision decoders at low SNR.
We derive and analyze a generic, recursive algorithm for estimating all splits in a finite cluster tree as well as the corresponding clusters. We further investigate statistical properties of this generic clustering algorithm when it receives level set estimates from a kernel density estimator. In particular, we derive finite sample guarantees, consistency, rates of convergence, and an adaptive data-driven strategy for choosing the kernel bandwidth. For these results we do not need continuity assumptions on the density such as H\"{o}lder continuity, but only require intuitive geometric assumptions of non-parametric nature.
The problem of finding the unique low dimensional decomposition of a given matrix has been a fundamental and recurrent problem in many areas. In this paper, we study the problem of seeking a unique decomposition of a low rank matrix $Y\in \mathbb{R}^{p\times n}$ that admits a sparse representation. Specifically, we consider $Y = A X\in \mathbb{R}^{p\times n}$ where the matrix $A\in \mathbb{R}^{p\times r}$ has full column rank, with $r < \min\{n,p\}$, and the matrix $X\in \mathbb{R}^{r\times n}$ is element-wise sparse. We prove that this sparse decomposition of $Y$ can be uniquely identified, up to some intrinsic signed permutation. Our approach relies on solving a nonconvex optimization problem constrained over the unit sphere. Our geometric analysis for the nonconvex optimization landscape shows that any {\em strict} local solution is close to the ground truth solution, and can be recovered by a simple data-driven initialization followed with any second order descent algorithm. At last, we corroborate these theoretical results with numerical experiments.
This paper considers the problem of estimating high dimensional Laplacian constrained precision matrices by minimizing Stein's loss. We obtain a necessary and sufficient condition for existence of this estimator, that boils down to checking whether a certain data dependent graph is connected. We also prove consistency in the high dimensional setting under the symmetryzed Stein loss. We show that the error rate does not depend on the graph sparsity, or other type of structure, and that Laplacian constraints are sufficient for high dimensional consistency. Our proofs exploit properties of graph Laplacians, and a characterization of the proposed estimator based on effective graph resistances. We validate our theoretical claims with numerical experiments.
We consider the problem of estimating a $d$-dimensional discrete distribution from its samples observed under a $b$-bit communication constraint. In contrast to most previous results that largely focus on the global minimax error, we study the local behavior of the estimation error and provide \emph{pointwise} bounds that depend on the target distribution $p$. In particular, we show that the $\ell_2$ error decays with $O\left(\frac{\lVert p\rVert_{1/2}}{n2^b}\vee \frac{1}{n}\right)$ (In this paper, we use $a\vee b$ and $a \wedge b$ to denote $\max(a, b)$ and $\min(a,b)$ respectively.) when $n$ is sufficiently large, hence it is governed by the \emph{half-norm} of $p$ instead of the ambient dimension $d$. For the achievability result, we propose a two-round sequentially interactive estimation scheme that achieves this error rate uniformly over all $p$. Our scheme is based on a novel local refinement idea, where we first use a standard global minimax scheme to localize $p$ and then use the remaining samples to locally refine our estimate. We also develop a new local minimax lower bound with (almost) matching $\ell_2$ error, showing that any interactive scheme must admit a $\Omega\left( \frac{\lVert p \rVert_{{(1+\delta)}/{2}}}{n2^b}\right)$ $\ell_2$ error for any $\delta > 0$. The lower bound is derived by first finding the best parametric sub-model containing $p$, and then upper bounding the quantized Fisher information under this model. Our upper and lower bounds together indicate that the $\mathcal{H}_{1/2}(p) = \log(\lVert p \rVert_{{1}/{2}})$ bits of communication is both sufficient and necessary to achieve the optimal (centralized) performance, where $\mathcal{H}_{{1}/{2}}(p)$ is the R\'enyi entropy of order $2$. Therefore, under the $\ell_2$ loss, the correct measure of the local communication complexity at $p$ is its R\'enyi entropy.
Many representative graph neural networks, $e.g.$, GPR-GNN and ChebyNet, approximate graph convolutions with graph spectral filters. However, existing work either applies predefined filter weights or learns them without necessary constraints, which may lead to oversimplified or ill-posed filters. To overcome these issues, we propose $\textit{BernNet}$, a novel graph neural network with theoretical support that provides a simple but effective scheme for designing and learning arbitrary graph spectral filters. In particular, for any filter over the normalized Laplacian spectrum of a graph, our BernNet estimates it by an order-$K$ Bernstein polynomial approximation and designs its spectral property by setting the coefficients of the Bernstein basis. Moreover, we can learn the coefficients (and the corresponding filter weights) based on observed graphs and their associated signals and thus achieve the BernNet specialized for the data. Our experiments demonstrate that BernNet can learn arbitrary spectral filters, including complicated band-rejection and comb filters, and it achieves superior performance in real-world graph modeling tasks.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.