We propose a rectangular rotational invariant estimator to recover a real matrix from noisy matrix observations coming from an arbitrary additive rotational invariant perturbation, in the large dimension limit. Using the Bayes-optimality of this estimator, we derive the asymptotic minimum mean squared error (MMSE). For the particular case of Gaussian noise, we find an explicit expression for the MMSE in terms of the limiting singular value distribution of the observation matrix. Moreover, we prove a formula linking the asymptotic mutual information and the limit of log-spherical integral of rectangular matrices. We also provide numerical checks for our results, which match our theoretical predictions and known Bayesian inference results.
Saddle points constitute a crucial challenge for first-order gradient descent algorithms. In notions of classical machine learning, they are avoided for example by means of stochastic gradient descent methods. In this work, we provide evidence that the saddle points problem can be naturally avoided in variational quantum algorithms by exploiting the presence of stochasticity. We prove convergence guarantees and present practical examples in numerical simulations and on quantum hardware. We argue that the natural stochasticity of variational algorithms can be beneficial for avoiding strict saddle points, i.e., those saddle points with at least one negative Hessian eigenvalue. This insight that some levels of shot noise could help is expected to add a new perspective to notions of near-term variational quantum algorithms.
We derive limiting distributions of symmetrized estimators of scatter, where instead of all $n(n-1)/2$ pairs of the $n$ observations we only consider $nd$ suitably chosen pairs, $1 \le d < \lfloor n/2\rfloor$. It turns out that the resulting estimators are asymptotically equivalent to the original one whenever $d = d(n) \to \infty$ at arbitrarily slow speed. We also investigate the asymptotic properties for arbitrary fixed $d$. These considerations and numerical examples indicate that for practical purposes, moderate fixed values of $d$ between,say, $10$ and $20$ yield already estimators which are computationally feasible and rather close to the original ones.
Consider sample covariance matrices of the form $Q:=\Sigma^{1/2} X X^\top \Sigma^{1/2}$, where $X=(x_{ij})$ is an $n\times N$ random matrix whose entries are independent random variables with mean zero and variance $N^{-1}$, and $\Sigma$ is a deterministic positive-definite covariance matrix. We study the limiting behavior of the eigenvectors of $Q$ through the so-called eigenvector empirical spectral distribution $F_{\mathbf v}$, which is an alternative form of empirical spectral distribution with weights given by $|\mathbf v^\top \xi_k|^2$, where $\mathbf v$ is a deterministic unit vector and $\xi_k$ are the eigenvectors of $Q$. We prove a functional central limit theorem for the linear spectral statistics of $F_{\mathbf v}$, indexed by functions with H\"older continuous derivatives. We show that the linear spectral statistics converge to some Gaussian processes both on global scales of order 1 and on local scales that are much smaller than 1 but much larger than the typical eigenvalue spacing $N^{-1}$. Moreover, we give explicit expressions for the covariance functions of the Gaussian processes, where the exact dependence on $\Sigma$ and $\mathbf v$ is identified for the first time in the literature.
Estimation of signal-to-noise ratios and residual variances in high-dimensional linear models has various important applications including, e.g. heritability estimation in bioinformatics. One commonly used estimator, usually referred to as REML, is based on the likelihood of the random effects model, in which both the regression coefficients and the noise variables are respectively assumed to be i.i.d Gaussian random variables. In this paper, we aim to establish the consistency and asymptotic distribution of the REML estimator for the SNR, when the actual coefficient vector is fixed, and the actual noise is heteroscedastic and correlated, at the cost of assuming the entries of the design matrix are independent and skew-free. The asymptotic variance can be also consistently estimated when the noise is heteroscedastic but uncorrelated. Extensive numerical simulations illustrate our theoretical findings and also suggest some assumptions imposed in our theoretical results are likely relaxable.
We consider a statistical model for matrix factorization in a regime where the rank of the two hidden matrix factors grows linearly with their dimension and their product is corrupted by additive noise. Despite various approaches, statistical and algorithmic limits of such problems have remained elusive. We study a Bayesian setting with the assumptions that (a) one of the matrix factors is symmetric, (b) both factors as well as the additive noise have rotational invariant priors, (c) the priors are known to the statistician. We derive analytical formulas for Rotation Invariant Estimators to reconstruct the two matrix factors, and conjecture that these are optimal in the large-dimension limit, in the sense that they minimize the average mean-square-error. We provide numerical checks which confirm the optimality conjecture when confronted to Oracle Estimators which are optimal by definition, but involve the ground-truth. Our derivation relies on a combination of tools, namely random matrix theory transforms, spherical integral formulas, and the replica method from statistical mechanics.
In the setting of functional data analysis, we derive optimal rates of convergence in the supremum norm for estimating the H\"older-smooth mean function of a stochastic processes which is repeatedly and discretely observed at fixed, multivariate, synchronous design points and with additional errors. Similarly to the rates in $L_2$ obtained in Cai and Yuan (2011), for sparse design a discretization term dominates, while in the dense case the $\sqrt n$ rate can be achieved as if the $n$ processes were continuously observed without errors. However, our analysis differs in several respects from Cai and Yuan (2011). First, we do not assume that the paths of the processes are as smooth as the mean, but still obtain the $\sqrt n$ rate of convergence without additional logarithmic factors in the dense setting. Second, we show that in the supremum norm, there is an intermediate regime between the sparse and dense cases dominated by the contribution of the observation errors. Third, and in contrast to the analysis in $L_2$, interpolation estimators turn out to be sub-optimal in $L_\infty$ in the dense setting, which explains their poor empirical performance. We also obtain a central limit theorem in the supremum norm and discuss the selection of the bandwidth. Simulations and real data applications illustrate the results.
Matrix factor model is drawing growing attention for simultaneous two-way dimension reduction of well-structured matrix-valued observations. This paper focuses on robust statistical inference for matrix factor model in the ``diverging dimension" regime. We derive the convergence rates of the robust estimators for loadings, factors and common components under finite second moment assumption of the idiosyncratic errors. In addition, the asymptotic distributions of the estimators are also derived under mild conditions. We propose a rank minimization and an eigenvalue-ratio method to estimate the pair of factor numbers consistently. Numerical studies confirm the iterative Huber regression algorithm is a practical and reliable approach for the estimation of matrix factor model, especially under the cases with heavy-tailed idiosyncratic errors . We illustrate the practical usefulness of the proposed methods by two real datasets, one on financial portfolios and one on the macroeconomic indices of China.
This article focuses on a class of distributionally robust optimization (DRO) problems where, unlike the growing body of the literature, the objective function is potentially non-linear in the distribution. Existing methods to optimize nonlinear functions in probability space use the Frechet derivatives, which present both theoretical and computational challenges. Motivated by this, we propose an alternative notion for the derivative and corresponding smoothness based on Gateaux (G)-derivative for generic risk measures. These concepts are explained via three running risk measure examples of variance, entropic risk, and risk on finite support sets. We then propose a G-derivative based Frank-Wolfe~(FW) algorithm for generic non-linear optimization problems in probability spaces and establish its convergence under the proposed notion of smoothness in a completely norm-independent manner. We use the set-up of the FW algorithm to devise a methodology to compute a saddle point of the non-linear DRO problem. Finally, for the minimum variance portfolio selection problem we analyze the regularity conditions and compute the FW-oracle in various settings, and validate the theoretical results numerically.
In many practical applications including remote sensing, multi-task learning, and multi-spectrum imaging, data are described as a set of matrices sharing a common column space. We consider the joint estimation of such matrices from their noisy linear measurements. We study a convex estimator regularized by a pair of matrix norms. The measurement model corresponds to block-wise sensing and the reconstruction is possible only when the total energy is well distributed over blocks. The first norm, which is the maximum-block-Frobenius norm, favors such a solution. This condition is analogous to the notion of low-spikiness in matrix completion or column-wise sensing. The second norm, which is a tensor norm on a pair of suitable Banach spaces, induces low-rankness in the solution together with the first norm. We demonstrate that the joint estimation provides a significant gain over the individual recovery of each matrix when the number of matrices sharing a column space and the ambient dimension of the shared column space are large relative to the number of columns in each matrix. The convex estimator is cast as a semidefinite program and an efficient ADMM algorithm is derived. The empirical behavior of the convex estimator is illustrated using Monte Carlo simulations and recovery performance is compared to existing methods in the literature.
This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.