草莓视频在线观看免费完整,日韩欧美国产AⅤ另类

Fitting regression models with many multivariate responses and covariates can be challenging, but such responses and covariates sometimes have tensor-variate structure. We extend the classical multivariate regression model to exploit such structure in two ways: first, we impose four types of low-rank tensor formats on the regression coefficients. Second, we model the errors using the tensor-variate normal distribution that imposes a Kronecker separable format on the covariance matrix. We obtain maximum likelihood estimators via block-relaxation algorithms and derive their computational complexity and asymptotic distributions. Our regression framework enables us to formulate tensor-variate analysis of variance (TANOVA) methodology. This methodology, when applied in a one-way TANOVA layout, enables us to identify cerebral regions significantly associated with the interaction of suicide attempters or non-attemptor ideators and positive-, negative- or death-connoting words in a functional Magnetic Resonance Imaging study. Another application uses three-way TANOVA on the Labeled Faces in the Wild image dataset to distinguish facial characteristics related to ethnic origin, age group and gender.

相關內容

方差

關注 0

矩 · 高斯混合（模型） · 可約的 · 估計/估計量 · MoDELS ·

2022 年 2 月 14 日

Tensor Moments of Gaussian Mixture Models: Theory and Applications

Jo?o M. Pereira,Joe Kileel,Tamara G. Kolda

Gaussian mixture models (GMM) are fundamental tools in statistical and data sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of size $n^d$, so working with moments naively is assumed to be prohibitively expensive for $d>2$ and larger values of $n$. In this work, we develop theory and numerical methods for implicit computations with moment tensors of GMMs, reducing the computational and storage costs to $\mathcal{O}(n^2)$ and $\mathcal{O}(n^3)$, respectively, for general covariance matrices, and to $\mathcal{O}(n)$ and $\mathcal{O}(n)$, respectively, for diagonal ones. We derive concise analytic expressions for the moments in terms of symmetrized tensor products, relying on the correspondence between symmetric tensors and homogeneous polynomials, and combinatorial identities involving Bell polynomials. The primary application of this theory is to estimating GMM parameters from a set of observations, when formulated as a moment-matching optimization problem. If there is a known and common covariance matrix, we also show it is possible to debias the data observations, in which case the problem of estimating the unknown means reduces to symmetric CP tensor decomposition. Numerical results validate and illustrate the numerical efficiency of our approaches. This work potentially opens the door to the competitiveness of the method of moments as compared to expectation maximization methods for parameter estimation of GMMs.

查準率/準確率 · 評論員 · 通道 · 穩健性 · 原點 ·

2022 年 2 月 14 日

Manipulation-Robust Regression Discontinuity Designs

Takuya Ishihara,Masayuki Sawada

from arxiv, This work has been circulated as "Harmless and Detectable Manipulations of the Running Variable in Regression Discontinuity Designs: Tests and Bounds."

In regression discontinuity designs, manipulation, a threat to identification, had no formal characterization. This study is the first formalization of which manipulations harm identification and are detectable in the density test. Two channels characterize the harmful manipulation: the precise control of the manipulated assignment status, and the precise decision to manipulate by the given assignment status. The latter, a novel channel, redefines all the rationale for justifying point-identification, diagnostic tests, and worst-case bounds in more general forms than before. In the replication of the Romanian high-school admission study, the precise decision appears as selective attrition by admission results. Our replication demonstrates that the precise decision is critical for the robustness of the original conclusion of the study.

團 · 可約的 · CASES ·

2022 年 2 月 14 日

Average complexity of matrix reduction for clique filtrations

Barbara Giunti,Guillaume Houry,Michael Kerber

from arxiv, Addition of the construction of a family of clique filtrations realising the worst-case fill-up and runtime complexity

We study the algorithmic complexity of computing persistent homology of a randomly chosen filtration. Specifically, we prove upper bounds for the average fill-up (number of non-zero entries) of the boundary matrix on Erd\"os-R\'enyi and Vietoris-Rips filtrations after matrix reduction. Our bounds show that, in both cases, the reduced matrix is expected to be significantly sparser than what the general worst-case predicts. Our method is based on previous results on the expected first Betti numbers of corresponding complexes. We establish a link between these results to the fill-up of the boundary matrix. Our bound for Vietoris-Rips complexes is asymptotically tight up to logarithmic factors. We also provide an Erd\"os-R\'enyi filtration realising the worst-case.

馬爾可夫隨機場 · 隨機場 · MCMC · 蒙特卡羅 · 統計量 ·

2022 年 2 月 13 日

Cauchy Markov Random Field Priors for Bayesian Inversion

Neil K. Chada,Lassi Roininen,Jarkko Suuronen

The use of Cauchy Markov random field priors in statistical inverse problems can potentially lead to posterior distributions which are non-Gaussian, high-dimensional, multimodal and heavy-tailed. In order to use such priors successfully, sophisticated optimization and Markov chain Monte Carlo (MCMC) methods are usually required. In this paper, our focus is largely on reviewing recently developed Cauchy difference priors, while introducing interesting new variants, whilst providing a comparison. We firstly propose a one-dimensional second order Cauchy difference prior, and construct new first and second order two-dimensional isotropic Cauchy difference priors. Another new Cauchy prior is based on the stochastic partial differential equation approach, derived from Mat\'{e}rn type Gaussian presentation. The comparison also includes Cauchy sheets. Our numerical computations are based on both maximum a posteriori and conditional mean estimation.We exploit state-of-the-art MCMC methodologies such as Metropolis-within-Gibbs, Repelling-Attracting Metropolis, and No-U-Turn sampler variant of Hamiltonian Monte Carlo. We demonstrate the models and methods constructed for one-dimensional and two-dimensional deconvolution problems. Thorough MCMC statistics are provided for all test cases, including potential scale reduction factors.

過擬合 · 線性回歸 · 線性的 · 泛化理論 · 協方差矩陣 ·

2022 年 2 月 12 日

Relaxing the Feature Covariance Assumption: Time-Variant Bounds for Benign Overfitting in Linear Regression

Jing Xu,Jiaye Teng,Andrew Chi-Chih Yao

Benign overfitting demonstrates that overparameterized models can perform well on test data while fitting noisy training data. However, it only considers the final min-norm solution in linear regression, which ignores the algorithm information and the corresponding training procedure. In this paper, we generalize the idea of benign overfitting to the whole training trajectory instead of the min-norm solution and derive a time-variant bound based on the trajectory analysis. Starting from the time-variant bound, we further derive a time interval that suffices to guarantee a consistent generalization error for a given feature covariance. Unlike existing approaches, the newly proposed generalization bound is characterized by a time-variant effective dimension of feature covariance. By introducing the time factor, we relax the strict assumption on the feature covariance matrix required in previous benign overfitting under the regimes of overparameterized linear regression with gradient descent. This paper extends the scope of benign overfitting, and experiment results indicate that the proposed bound accords better with empirical evidence.

MoDELS · 線性的 · 線性回歸 · 頻率主義學派 · 可辨認的 ·

2022 年 2 月 11 日

Linear Regression, Covariate Selection and the Failure of Modelling

Laurie Davies

from arxiv, 19 pages 4 figures

It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required. The comparison is based on four data sets and two simulations. There is nothing special about these data sets which are often used as examples in the literature. All the model based procedures failed. In the second part of the paper it is argued that the cause of this failure is the very use of a model. If the model involves all the available covariates standard P-values can be used. The use of P-values in this situation is quite straightforward. As soon as the model specifies only some unknown subset of the covariates the problem being to identify this subset the situation changes radically. There are many P-values, they are dependent and most of them are invalid. The Bayesian paradigm also assumes a correct model but although there are no conceptual problems with a large number of covariates there is a considerable overhead causing computational and allocation problems even for moderately sized data sets. The Gaussian covariate procedure is based on P-values which are defined as the probability that a random Gaussian covariate is better than the covariate being considered. These P-values are exact and valid whatever the situation. The allocation requirements and the algorithmic complexity are both linear in the size of the data making the procedure capable of handling large data sets. It outperforms all the other procedures in every respect.

Performer · 泛函 · 統計方法 ·

2022 年 2 月 11 日

Multivariate distance matrix regression for a manifold-valued response variable

Matt Ryan,Gary Glonek,Melissa Humphries,Jono Tuke

In this paper, we propose the use of geodesic distances in conjunction with multivariate distance matrix regression, called geometric-MDMR, as a powerful first step analysis method for manifold-valued data. Manifold-valued data is appearing more frequently in the literature from analyses of earthquake to analysing brain patterns. Accounting for the structure of this data increases the complexity of your analysis, but allows for much more interpretable results in terms of the data. To test geometric-MDMR, we develop a method to simulate functional connectivity matrices for fMRI data to perform a simulation study, which shows that our method outperforms the current standards in fMRI analysis.

Processing（編程語言） · 高斯過程回歸 · PDE · Performer · 線性的 ·

2022 年 2 月 10 日

Stochastic Processes Under Linear Differential Constraints : Application to Gaussian Process Regression for the 3 Dimensional Free Space Wave Equation

Iain Henderson,Pascal Noble,Olivier Roustant

from arxiv, 64 pages, 15 figures

Let $P$ be a linear differential operator over $\mathcal{D} \subset \mathbb{R}^d$ and $U = (U_x)_{x \in \mathcal{D}}$ a second order stochastic process. In the first part of this article, we prove a new necessary and sufficient condition for all the trajectories of $U$ to verify the partial differential equation (PDE) $T(U) = 0$. This condition is formulated in terms of the covariance kernel of $U$. When compared to previous similar results, the novelty lies in that the equality $T(U) = 0$ is understood in the \textit{sense of distributions}, which is a relevant framework for PDEs. This theorem provides precious insights during the second part of this article, devoted to performing "physically informed" machine learning for the homogeneous 3 dimensional free space wave equation. We perform Gaussian process regression (GPR) on pointwise observations of a solution of this PDE. To do so, we propagate Gaussian processes (GP) priors over its initial conditions through the wave equation. We obtain explicit formulas for the covariance kernel of the propagated GP, which can then be used for GPR. We then explore the particular cases of radial symmetry and point source. For the former, we derive convolution-free GPR formulas; for the latter, we show a direct link between GPR and the classical triangulation method for point source localization used in GPS systems. Additionally, this Bayesian framework provides a new answer for the ill-posed inverse problem of reconstructing initial conditions for the wave equation with a limited number of sensors, and simultaneously enables the inference of physical parameters from these data. Finally, we illustrate this physically informed GPR on a number of practical examples.

Weight · 隨機森林 · 正定 · 復合數據 · 核化 ·

2022 年 2 月 10 日

Random Forests Weighted Local Fréchet Regression with Theoretical Guarantee

Rui Qiu,Zhou Yu,Ruoqing Zhu

Statistical analysis is increasingly confronted with complex data from general metric spaces, such as symmetric positive definite matrix-valued data and probability distribution functions. [47] and [17] establish a general paradigm of Fr\'echet regression with complex metric space valued responses and Euclidean predictors. However, their proposed local Fr\'echet regression approach involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forests weighted local Fr\'echet regression paradigm. The main mechanism of our approach relies on the adaptive kernels generated by random forests. Our first method utilizes these weights as the local average to solve the Fr\'echet mean, while the second method performs local linear Fr\'echet regression, making both methods locally adaptive. Our proposals significantly improve existing Fr\'echet regression methods. Based on the theory of infinite order U-processes and infinite order Mmn-estimator, we establish the consistency, rate of convergence, and asymptotic normality for our proposed random forests weighted Fr\'echet regression estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our proposed two methods for Fr\'echet regression with several commonly encountered types of responses such as probability distribution functions, symmetric positive definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to the human mortality distribution data.

估計/估計量 · 最大似然估計 · 協方差矩陣 · 極大似然 · Performer ·

2014 年 10 月 9 日

A three domain covariance framework for EEG/MEG data

Beata Ro?,Fetsje Bijma,Mathisca de Gunst,Jan de Munck

from arxiv, 25 pages, 8 figures, 1 table

In this paper we introduce a covariance framework for the analysis of EEG and MEG data that takes into account observed temporal stationarity on small time scales and trial-to-trial variations. We formulate a model for the covariance matrix, which is a Kronecker product of three components that correspond to space, time and epochs/trials, and consider maximum likelihood estimation of the unknown parameter values. An iterative algorithm that finds approximations of the maximum likelihood estimates is proposed. We perform a simulation study to assess the performance of the estimator and investigate the influence of different assumptions about the covariance factors on the estimated covariance matrix and on its components. Apart from that, we illustrate our method on real EEG and MEG data sets. The proposed covariance model is applicable in a variety of cases where spontaneous EEG or MEG acts as source of noise and realistic noise covariance estimates are needed for accurate dipole localization, such as in evoked activity studies, or where the properties of spontaneous EEG or MEG are themselves the topic of interest, such as in combined EEG/fMRI experiments in which the correlation between EEG and fMRI signals is investigated.