The inference of a large symmetric signal-matrix $\mathbf{S} \in \mathbb{R}^{N\times N}$ corrupted by additive Gaussian noise, is considered for two regimes of growth of the rank $M$ as a function of $N$. For sub-linear ranks $M=\Theta(N^\alpha)$ with $\alpha\in(0,1)$ the mutual information and minimum mean-square error (MMSE) are derived for two classes of signal-matrices: (a) $\mathbf{S}=\mathbf{X}\mathbf{X}^\intercal$ with entries of $\mathbf{X}\in\mathbb{R}^{N\times M}$ independent identically distributed; (b) $\mathbf{S}$ sampled from a rotationally invariant distribution. Surprisingly, the formulas match the rank-one case. Two efficient algorithms are explored and conjectured to saturate the MMSE when no statistical-to-computational gap is present: (1) Decimation Approximate Message Passing; (2) a spectral algorithm based on a Rotation Invariant Estimator. For linear ranks $M=\Theta(N)$ the mutual information is rigorously derived for signal-matrices from a rotationally invariant distribution. Close connections with scalar inference in free probability are uncovered, which allow to deduce a simple formula for the MMSE as an integral involving the limiting spectral measure of the data matrix only. An interesting issue is whether the known information theoretic phase transitions for rank-one, and hence also sub-linear-rank, still persist in linear-rank. Our analysis suggests that only a smoothed-out trace of the transitions persists. Furthermore, the change of behavior between low and truly high-rank regimes only happens at the linear scale $\alpha=1$.
The big data era of science and technology motivates statistical modeling of matrix-valued data using a low-rank representation that simultaneously summarizes key characteristics of the data and enables dimension reduction for data compression and storage. Low-rank representations such as singular value decomposition factor the original data into the product of orthonormal basis functions and weights, where each basis function represents an independent feature of the data. However, the basis functions in these factorizations are typically computed using algorithmic methods that cannot quantify uncertainty or account for explicit structure beyond what is implicitly specified via data correlation. We propose a flexible prior distribution for orthonormal matrices that can explicitly model structure in the basis functions. The prior is used within a general probabilistic model for singular value decomposition to conduct posterior inference on the basis functions while accounting for measurement error and fixed effects. To contextualize the proposed prior and model, we discuss how the prior specification can be used for various scenarios and relate the model to its deterministic counterpart. We demonstrate favorable model properties through synthetic data examples and apply our method to sea surface temperature data from the northern Pacific, enhancing our understanding of the ocean's internal variability.
In this paper, we study a general low-rank matrix recovery problem with linear measurements corrupted by some noise. The objective is to understand under what conditions on the restricted isometry property (RIP) of the problem local search methods can find the ground truth with a small error. By analyzing the landscape of the non-convex problem, we first propose a global guarantee on the maximum distance between an arbitrary local minimizer and the ground truth under the assumption that the RIP constant is smaller than $1/2$. We show that this distance shrinks to zero as the intensity of the noise reduces. Our new guarantee is sharp in terms of the RIP constant and is much stronger than the existing results. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. Next, we prove the strict saddle property, which guarantees the global convergence of the perturbed gradient descent method in polynomial time. The developed results demonstrate how the noise intensity and the RIP constant of the problem affect the landscape of the problem.
Value-function (VF) approximation is a central problem in Reinforcement Learning (RL). Classical non-parametric VF estimation suffers from the curse of dimensionality. As a result, parsimonious parametric models have been adopted to approximate VFs in high-dimensional spaces, with most efforts being focused on linear and neural-network-based approaches. Differently, this paper puts forth a a \emph{parsimonious non-parametric} approach, where we use \emph{stochastic low-rank algorithms} to estimate the VF matrix in an online and model-free fashion. Furthermore, as VFs tend to be multi-dimensional, we propose replacing the classical VF matrix representation with a tensor (multi-way array) representation and, then, use the PARAFAC decomposition to design an online model-free tensor low-rank algorithm. Different versions of the algorithms are proposed, their complexity is analyzed, and their performance is assessed numerically using standardized RL environments.
In this paper, we derive results about the limiting distribution of the empirical magnetization vector and the maximum likelihood (ML) estimates of the natural parameters in the tensor Curie-Weiss Potts model. Our results reveal surprisingly new phase transition phenomena including the existence of a smooth curve in the interior of the parameter plane on which the magnetization vector and the ML estimates have mixture limiting distributions, the latter comprising of both continuous and discrete components, and a surprising superefficiency phenomenon of the ML estimates, which stipulates an $N^{-3/4}$ rate of convergence of the estimates to some non-Gaussian distribution at certain special points of one type and an $N^{-5/6}$ rate of convergence to some other non-Gaussian distribution at another special point of a different type. The last case can arise only for one particular value of the tuple of the tensor interaction order and the number of colors. These results are then used to derive asymptotic confidence intervals for the natural parameters at all points where consistent estimation is possible.
A generalization of a recently introduced recursive numerical method for the exact evaluation of integrals of regular solid harmonics and their normal derivatives over simplex elements in $\mathbb{R}^3$ is presented. The original Quadrature to Expansion (Q2X) method achieves optimal per-element asymptotic complexity, however, it considered only constant density functions over the elements. Here, we generalize this method to support arbitrary degree polynomial density functions, which is achieved in an extended recursive framework while maintaining the optimality of the complexity. The method is derived for 1- and 2- simplex elements in $\mathbb{R}^3$ and can be used for the boundary element method and vortex methods coupled with the fast multipole method.
We review Quasi Maximum Likelihood estimation of factor models for high-dimensional panels of time series. We consider two cases: (1) estimation when no dynamic model for the factors is specified (Bai and Li, 2012, 2016); (2) estimation based on the Kalman smoother and the Expectation Maximization algorithm thus allowing to model explicitly the factor dynamics (Doz et al., 2012, Barigozzi and Luciani, 2019). Our interest is in approximate factor models, i.e., when we allow for the idiosyncratic components to be mildly cross-sectionally, as well as serially, correlated. Although such setting apparently makes estimation harder, we show, in fact, that factor models do not suffer of the curse of dimensionality problem, but instead they enjoy a blessing of dimensionality property. In particular, given an approximate factor structure, if the cross-sectional dimension of the data, $N$, grows to infinity, we show that: (i) identification of the model is still possible, (ii) the mis-specification error due to the use of an exact factor model log-likelihood vanishes. Moreover, if we let also the sample size, $T$, grow to infinity, we can also consistently estimate all parameters of the model and make inference. The same is true for estimation of the latent factors which can be carried out by weighted least-squares, linear projection, or Kalman filtering/smoothing. We also compare the approaches presented with: Principal Component analysis and the classical, fixed $N$, exact Maximum Likelihood approach. We conclude with a discussion on efficiency of the considered estimators
In this paper, we study the weighted $k$-server problem on the uniform metric in both the offline and online settings. We start with the offline setting. In contrast to the (unweighted) $k$-server problem which has a polynomial-time solution using min-cost flows, there are strong computational lower bounds for the weighted $k$-server problem, even on the uniform metric. Specifically, we show that assuming the unique games conjecture, there are no polynomial-time algorithms with a sub-polynomial approximation factor, even if we use $c$-resource augmentation for $c < 2$. Furthermore, if we consider the natural LP relaxation of the problem, then obtaining a bounded integrality gap requires us to use at least $\ell$ resource augmentation, where $\ell$ is the number of distinct server weights. We complement these results by obtaining a constant-approximation algorithm via LP rounding, with a resource augmentation of $(2+\epsilon)\ell$ for any constant $\epsilon > 0$. In the online setting, an $\exp(k)$ lower bound is known for the competitive ratio of any randomized algorithm for the weighted $k$-server problem on the uniform metric. In contrast, we show that $2\ell$-resource augmentation can bring the competitive ratio down by an exponential factor to only $O(\ell^2 \log \ell)$. Our online algorithm uses the two-stage approach of first obtaining a fractional solution using the online primal-dual framework, and then rounding it online.
Selective inference (SI) has been actively studied as a promising framework for statistical hypothesis testing for data-driven hypotheses. The basic idea of SI is to make inferences conditional on an event that a hypothesis is selected. In order to perform SI, this event must be characterized in a traceable form. When selection event is too difficult to characterize, additional conditions are introduced for tractability. This additional conditions often causes the loss of power, and this issue is referred to as over-conditioning. Parametric programming-based SI (PP-based SI) has been proposed as one way to address the over-conditioning issue. The main problem of PP-based SI is its high computational cost due to the need to exhaustively explore the data space. In this study, we introduce a procedure to reduce the computational cost while guaranteeing the desired precision, by proposing a method to compute the upper and lower bounds of p-values. We also proposed three types of search strategies that efficiently improve these bounds. We demonstrate the effectiveness of the proposed method in hypothesis testing problems for feature selection in linear models and attention region identification in deep neural networks.
The ability to envision future states is crucial to informed decision making while interacting with dynamic environments. With cameras providing a prevalent and information rich sensing modality, the problem of predicting future states from image sequences has garnered a lot of attention. Current state of the art methods typically train large parametric models for their predictions. Though often able to predict with accuracy, these models rely on the availability of large training datasets to converge to useful solutions. In this paper we focus on the problem of predicting future images of an image sequence from very little training data. To approach this problem, we use non-parametric models to take a probabilistic approach to image prediction. We generate probability distributions over sequentially predicted images and propagate uncertainty through time to generate a confidence metric for our predictions. Gaussian Processes are used for their data efficiency and ability to readily incorporate new training data online. We showcase our method by successfully predicting future frames of a smooth fluid simulation environment.
It has been observed that the performances of many high-dimensional estimation problems are universal with respect to underlying sensing (or design) matrices. Specifically, matrices with markedly different constructions seem to achieve identical performance if they share the same spectral distribution and have ``generic'' singular vectors. We prove this universality phenomenon for the case of convex regularized least squares (RLS) estimators under a linear regression model with additive Gaussian noise. Our main contributions are two-fold: (1) We introduce a notion of universality classes for sensing matrices, defined through a set of deterministic conditions that fix the spectrum of the sensing matrix and precisely capture the previously heuristic notion of generic singular vectors; (2) We show that for all sensing matrices that lie in the same universality class, the dynamics of the proximal gradient descent algorithm for solving the regression problem, as well as the performance of RLS estimators themselves (under additional strong convexity conditions) are asymptotically identical. In addition to including i.i.d. Gaussian and rotational invariant matrices as special cases, our universality class also contains highly structured, strongly correlated, or even (nearly) deterministic matrices. Examples of the latter include randomly signed versions of incoherent tight frames and randomly subsampled Hadamard transforms. As a consequence of this universality principle, the asymptotic performance of regularized linear regression on many structured matrices constructed with limited randomness can be characterized by using the rotationally invariant ensemble as an equivalent yet mathematically more tractable surrogate.