Fitting regression models with many multivariate responses and covariates can be challenging, but such responses and covariates sometimes have tensor-variate structure. We extend the classical multivariate regression model to exploit such structure in two ways: first, we impose four types of low-rank tensor formats on the regression coefficients. Second, we model the errors using the tensor-variate normal distribution that imposes a Kronecker separable format on the covariance matrix. We obtain maximum likelihood estimators via block-relaxation algorithms and derive their computational complexity and asymptotic distributions. Our regression framework enables us to formulate tensor-variate analysis of variance (TANOVA) methodology. This methodology, when applied in a one-way TANOVA layout, enables us to identify cerebral regions significantly associated with the interaction of suicide attempters or non-attemptor ideators and positive-, negative- or death-connoting words in a functional Magnetic Resonance Imaging study. Another application uses three-way TANOVA on the Labeled Faces in the Wild image dataset to distinguish facial characteristics related to ethnic origin, age group and gender.
Gaussian mixture models (GMM) are fundamental tools in statistical and data sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of size $n^d$, so working with moments naively is assumed to be prohibitively expensive for $d>2$ and larger values of $n$. In this work, we develop theory and numerical methods for implicit computations with moment tensors of GMMs, reducing the computational and storage costs to $\mathcal{O}(n^2)$ and $\mathcal{O}(n^3)$, respectively, for general covariance matrices, and to $\mathcal{O}(n)$ and $\mathcal{O}(n)$, respectively, for diagonal ones. We derive concise analytic expressions for the moments in terms of symmetrized tensor products, relying on the correspondence between symmetric tensors and homogeneous polynomials, and combinatorial identities involving Bell polynomials. The primary application of this theory is to estimating GMM parameters from a set of observations, when formulated as a moment-matching optimization problem. If there is a known and common covariance matrix, we also show it is possible to debias the data observations, in which case the problem of estimating the unknown means reduces to symmetric CP tensor decomposition. Numerical results validate and illustrate the numerical efficiency of our approaches. This work potentially opens the door to the competitiveness of the method of moments as compared to expectation maximization methods for parameter estimation of GMMs.
In regression discontinuity designs, manipulation, a threat to identification, had no formal characterization. This study is the first formalization of which manipulations harm identification and are detectable in the density test. Two channels characterize the harmful manipulation: the precise control of the manipulated assignment status, and the precise decision to manipulate by the given assignment status. The latter, a novel channel, redefines all the rationale for justifying point-identification, diagnostic tests, and worst-case bounds in more general forms than before. In the replication of the Romanian high-school admission study, the precise decision appears as selective attrition by admission results. Our replication demonstrates that the precise decision is critical for the robustness of the original conclusion of the study.
We study the algorithmic complexity of computing persistent homology of a randomly chosen filtration. Specifically, we prove upper bounds for the average fill-up (number of non-zero entries) of the boundary matrix on Erd\"os-R\'enyi and Vietoris-Rips filtrations after matrix reduction. Our bounds show that, in both cases, the reduced matrix is expected to be significantly sparser than what the general worst-case predicts. Our method is based on previous results on the expected first Betti numbers of corresponding complexes. We establish a link between these results to the fill-up of the boundary matrix. Our bound for Vietoris-Rips complexes is asymptotically tight up to logarithmic factors. We also provide an Erd\"os-R\'enyi filtration realising the worst-case.
The use of Cauchy Markov random field priors in statistical inverse problems can potentially lead to posterior distributions which are non-Gaussian, high-dimensional, multimodal and heavy-tailed. In order to use such priors successfully, sophisticated optimization and Markov chain Monte Carlo (MCMC) methods are usually required. In this paper, our focus is largely on reviewing recently developed Cauchy difference priors, while introducing interesting new variants, whilst providing a comparison. We firstly propose a one-dimensional second order Cauchy difference prior, and construct new first and second order two-dimensional isotropic Cauchy difference priors. Another new Cauchy prior is based on the stochastic partial differential equation approach, derived from Mat\'{e}rn type Gaussian presentation. The comparison also includes Cauchy sheets. Our numerical computations are based on both maximum a posteriori and conditional mean estimation.We exploit state-of-the-art MCMC methodologies such as Metropolis-within-Gibbs, Repelling-Attracting Metropolis, and No-U-Turn sampler variant of Hamiltonian Monte Carlo. We demonstrate the models and methods constructed for one-dimensional and two-dimensional deconvolution problems. Thorough MCMC statistics are provided for all test cases, including potential scale reduction factors.
Benign overfitting demonstrates that overparameterized models can perform well on test data while fitting noisy training data. However, it only considers the final min-norm solution in linear regression, which ignores the algorithm information and the corresponding training procedure. In this paper, we generalize the idea of benign overfitting to the whole training trajectory instead of the min-norm solution and derive a time-variant bound based on the trajectory analysis. Starting from the time-variant bound, we further derive a time interval that suffices to guarantee a consistent generalization error for a given feature covariance. Unlike existing approaches, the newly proposed generalization bound is characterized by a time-variant effective dimension of feature covariance. By introducing the time factor, we relax the strict assumption on the feature covariance matrix required in previous benign overfitting under the regimes of overparameterized linear regression with gradient descent. This paper extends the scope of benign overfitting, and experiment results indicate that the proposed bound accords better with empirical evidence.
It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required. The comparison is based on four data sets and two simulations. There is nothing special about these data sets which are often used as examples in the literature. All the model based procedures failed. In the second part of the paper it is argued that the cause of this failure is the very use of a model. If the model involves all the available covariates standard P-values can be used. The use of P-values in this situation is quite straightforward. As soon as the model specifies only some unknown subset of the covariates the problem being to identify this subset the situation changes radically. There are many P-values, they are dependent and most of them are invalid. The Bayesian paradigm also assumes a correct model but although there are no conceptual problems with a large number of covariates there is a considerable overhead causing computational and allocation problems even for moderately sized data sets. The Gaussian covariate procedure is based on P-values which are defined as the probability that a random Gaussian covariate is better than the covariate being considered. These P-values are exact and valid whatever the situation. The allocation requirements and the algorithmic complexity are both linear in the size of the data making the procedure capable of handling large data sets. It outperforms all the other procedures in every respect.
In this paper, we propose the use of geodesic distances in conjunction with multivariate distance matrix regression, called geometric-MDMR, as a powerful first step analysis method for manifold-valued data. Manifold-valued data is appearing more frequently in the literature from analyses of earthquake to analysing brain patterns. Accounting for the structure of this data increases the complexity of your analysis, but allows for much more interpretable results in terms of the data. To test geometric-MDMR, we develop a method to simulate functional connectivity matrices for fMRI data to perform a simulation study, which shows that our method outperforms the current standards in fMRI analysis.
Let $P$ be a linear differential operator over $\mathcal{D} \subset \mathbb{R}^d$ and $U = (U_x)_{x \in \mathcal{D}}$ a second order stochastic process. In the first part of this article, we prove a new necessary and sufficient condition for all the trajectories of $U$ to verify the partial differential equation (PDE) $T(U) = 0$. This condition is formulated in terms of the covariance kernel of $U$. When compared to previous similar results, the novelty lies in that the equality $T(U) = 0$ is understood in the \textit{sense of distributions}, which is a relevant framework for PDEs. This theorem provides precious insights during the second part of this article, devoted to performing "physically informed" machine learning for the homogeneous 3 dimensional free space wave equation. We perform Gaussian process regression (GPR) on pointwise observations of a solution of this PDE. To do so, we propagate Gaussian processes (GP) priors over its initial conditions through the wave equation. We obtain explicit formulas for the covariance kernel of the propagated GP, which can then be used for GPR. We then explore the particular cases of radial symmetry and point source. For the former, we derive convolution-free GPR formulas; for the latter, we show a direct link between GPR and the classical triangulation method for point source localization used in GPS systems. Additionally, this Bayesian framework provides a new answer for the ill-posed inverse problem of reconstructing initial conditions for the wave equation with a limited number of sensors, and simultaneously enables the inference of physical parameters from these data. Finally, we illustrate this physically informed GPR on a number of practical examples.
Statistical analysis is increasingly confronted with complex data from general metric spaces, such as symmetric positive definite matrix-valued data and probability distribution functions. [47] and [17] establish a general paradigm of Fr\'echet regression with complex metric space valued responses and Euclidean predictors. However, their proposed local Fr\'echet regression approach involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forests weighted local Fr\'echet regression paradigm. The main mechanism of our approach relies on the adaptive kernels generated by random forests. Our first method utilizes these weights as the local average to solve the Fr\'echet mean, while the second method performs local linear Fr\'echet regression, making both methods locally adaptive. Our proposals significantly improve existing Fr\'echet regression methods. Based on the theory of infinite order U-processes and infinite order Mmn-estimator, we establish the consistency, rate of convergence, and asymptotic normality for our proposed random forests weighted Fr\'echet regression estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our proposed two methods for Fr\'echet regression with several commonly encountered types of responses such as probability distribution functions, symmetric positive definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to the human mortality distribution data.
In this paper we introduce a covariance framework for the analysis of EEG and MEG data that takes into account observed temporal stationarity on small time scales and trial-to-trial variations. We formulate a model for the covariance matrix, which is a Kronecker product of three components that correspond to space, time and epochs/trials, and consider maximum likelihood estimation of the unknown parameter values. An iterative algorithm that finds approximations of the maximum likelihood estimates is proposed. We perform a simulation study to assess the performance of the estimator and investigate the influence of different assumptions about the covariance factors on the estimated covariance matrix and on its components. Apart from that, we illustrate our method on real EEG and MEG data sets. The proposed covariance model is applicable in a variety of cases where spontaneous EEG or MEG acts as source of noise and realistic noise covariance estimates are needed for accurate dipole localization, such as in evoked activity studies, or where the properties of spontaneous EEG or MEG are themselves the topic of interest, such as in combined EEG/fMRI experiments in which the correlation between EEG and fMRI signals is investigated.