青柠在线观看免费高清1,综合综合综合综合综合网

In the article we focus on large-dimensional matrix factor models and propose estimators of factor loading matrices and factor score matrix from the perspective of minimizing least squares objective function. The resultant estimators turns out to be equivalent to the corresponding projected estimators in Yu et al. (2021), which enjoys the nice properties of reducing the magnitudes of the idiosyncratic error components and thereby increasing the signal-to-noise ratio. We derive the convergence rate of the theoretical minimizers under sub-Gaussian tails, instead of the one-step iteration estimators by Yu et al. (2021). Motivated by the least squares formulation, we further consider a robust method for estimating large-dimensional matrix factor model by utilizing Huber Loss function. Theoretically, we derive the convergence rates of the robust estimators of the factor loading matrices under finite fourth moment conditions. We also propose an iterative procedure to estimate the pair of row and column factor numbers robustly. We conduct extensive numerical studies to investigate the empirical performance of the proposed robust methods relative to the sate-of-the-art ones, which show the proposed ones perform robustly and much better than the existing ones when data are heavy-tailed while perform almost the same (comparably) with the projected estimators when data are light-tailed, and as a result can be used as a safe replacement of the existing ones. An application to a Fama-French financial portfolios dataset illustrates its empirical usefulness.

相關內容

估計/估計量

關注 3

MoDELS · Performer · 偽標記 · 測試誤差 · 測試樣本 ·

2022 年 2 月 11 日

Predicting Out-of-Distribution Error with the Projection Norm

Yaodong Yu,Zitong Yang,Alexander Wei,Yi Ma,Jacob Steinhardt

We propose a metric -- Projection Norm -- to predict a model's performance on out-of-distribution (OOD) data without access to ground truth labels. Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels. The more the new model's parameters differ from an in-distribution model, the greater the predicted OOD error. Empirically, our approach outperforms existing methods on both image and text classification tasks and across different network architectures. Theoretically, we connect our approach to a bound on the test error for overparameterized linear models. Furthermore, we find that Projection Norm is the only approach that achieves non-trivial detection performance on adversarial examples. Our code is available at //github.com/yaodongyu/ProjNorm.

線性的 · 穩健性 · MoDELS · 估計/估計量 · 異常點 ·

2022 年 2 月 11 日

A robust spline approach in partially linear additive models

Graciela Boente,Alejandra Mercedes Martinez

Partially linear additive models generalize linear ones since they model the relation between a response variable and covariates by assuming that some covariates have a linear relation with the response but each of the others enter through unknown univariate smooth functions. The harmful effect of outliers either in the residuals or in the covariates involved in the linear component has been described in the situation of partially linear models, that is, when only one nonparametric component is involved in the model. When dealing with additive components, the problem of providing reliable estimators when atypical data arise, is of practical importance motivating the need of robust procedures. Hence, we propose a family of robust estimators for partially linear additive models by combining $B-$splines with robust linear regression estimators. We obtain consistency results, rates of convergence and asymptotic normality for the linear components, under mild assumptions. A Monte Carlo study is carried out to compare the performance of the robust proposal with its classical counterpart under different models and contamination schemes. The numerical experiments show the advantage of the proposed methodology for finite samples. We also illustrate the usefulness of the proposed approach on a real data set.

估計/估計量 · 統計量 · 推斷 · 非線性規劃 · 維數災難 ·

2022 年 2 月 11 日

Inference for Projection-Based Wasserstein Distances on Finite Spaces

Ryo Okano,Masaaki Imaizumi

The Wasserstein distance is a distance between two probability distributions and has recently gained increasing popularity in statistics and machine learning, owing to its attractive properties. One important approach to extending this distance is using low-dimensional projections of distributions to avoid a high computational cost and the curse of dimensionality in empirical estimation, such as the sliced Wasserstein or max-sliced Wasserstein distances. Despite their practical success in machine learning tasks, the availability of statistical inferences for projection-based Wasserstein distances is limited owing to the lack of distributional limit results. In this paper, we consider distances defined by integrating or maximizing Wasserstein distances between low-dimensional projections of two probability distributions. Then we derive limit distributions regarding these distances when the two distributions are supported on finite points. We also propose a bootstrap procedure to estimate quantiles of limit distributions from data. This facilitates asymptotically exact interval estimation and hypothesis testing for these distances. Our theoretical results are based on the arguments of Sommerfeld and Munk (2018) for deriving distributional limits regarding the original Wasserstein distance on finite spaces and the theory of sensitivity analysis in nonlinear programming. Finally, we conduct numerical experiments to illustrate the theoretical results and demonstrate the applicability of our inferential methods to real data analysis.

向量化 · 無向 · SimPLe · Pair · CASE ·

2022 年 2 月 10 日

Improved Compression of the Okamura-Seymour Metric

Shay Mozes,Nathan Wallheimer,Oren Weimann

Let $G=(V,E)$ be an undirected unweighted planar graph. Consider a vector storing the distances from an arbitrary vertex $v$ to all vertices $S = \{ s_1 , s_2 , \ldots , s_k \}$ of a single face in their cyclic order. The pattern of $v$ is obtained by taking the difference between every pair of consecutive values of this vector. In STOC'19, Li and Parter used a VC-dimension argument to show that in planar graphs, the number of distinct patterns, denoted $x$, is only $O(k^3)$. This resulted in a simple compression scheme requiring $\tilde O(\min \{ k^4+|T|, k\cdot |T|\})$ space to encode the distances between $S$ and a subset of terminal vertices $T \subseteq V$. This is known as the Okamura-Seymour metric compression problem. We give an alternative proof of the $x=O(k^3)$ bound that exploits planarity beyond the VC-dimension argument. Namely, our proof relies on cut-cycle duality, as well as on the fact that distances among vertices of $S$ are bounded by $k$. Our method implies the following: (1) An $\tilde{O}(x+k+|T|)$ space compression of the Okamura-Seymour metric, thus improving the compression of Li and Parter to $\tilde O(\min \{k^3+|T|,k \cdot |T| \})$. (2) An optimal $\tilde{O}(k+|T|)$ space compression of the Okamura-Seymour metric, in the case where the vertices of $T$ induce a connected component in $G$. (3) A tight bound of $x = \Theta(k^2)$ for the family of Halin graphs, whereas the VC-dimension argument is limited to showing $x=O(k^3)$.

Extensibility · 統計量 · 推斷 · 精確推斷 · 稀疏 ·

2022 年 2 月 9 日

On Statistical Inference with High Dimensional Sparse CCA

Nilanjana Laha,Nathan Huey,Brent Coull,Rajarshi Mukherjee

We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high dimensional vectors under sparsity restrictions. In this regard, our main contribution is the development of a loss function, based on which, one can operationalize a one-step bias-correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.

估計/估計量 · GM · MoDELS · 似然 · 極大似然 ·

2022 年 2 月 9 日

Locally associated graphical models and mixed convex exponential families

Steffen Lauritzen,Piotr Zwiernik

from arxiv, Supplementary material available at //econ.upf.edu/~piotr/supps/2020-LZ-golazo.zip

The notion of multivariate total positivity has proved to be useful in finance and psychology but may be too restrictive in other applications. In this paper we propose a concept of local association, where highly connected components in a graphical model are positively associated and study its properties. Our main motivation comes from gene expression data, where graphical models have become a popular exploratory tool. The models are instances of what we term mixed convex exponential families and we show that a mixed dual likelihood estimator has simple exact properties for such families as well as asymptotic properties similar to the maximum likelihood estimator. We further relax the positivity assumption by penalizing negative partial correlations in what we term the positive graphical lasso. Finally, we develop a GOLAZO algorithm based on block-coordinate descent that applies to a number of optimization procedures that arise in the context of graphical models, including the estimation problems described above. We derive results on existence of the optimum for such problems.

優化器 · 學習率 · 學成 · CASE · Extensibility ·

2022 年 2 月 9 日

Optimal learning rate schedules in high-dimensional non-convex optimization problems

Stéphane d'Ascoli,Maria Refinetti,Giulio Biroli

Learning rate schedules are ubiquitously used to speed up and improve optimisation. Many different policies have been introduced on an empirical basis, and theoretical analyses have been developed for convex settings. However, in many realistic problems the loss-landscape is high-dimensional and non convex -- a case for which results are scarce. In this paper we present a first analytical study of the role of learning rate scheduling in this setting, focusing on Langevin optimization with a learning rate decaying as $\eta(t)=t^{-\beta}$. We begin by considering models where the loss is a Gaussian random function on the $N$-dimensional sphere ($N\rightarrow \infty$), featuring an extensive number of critical points. We find that to speed up optimization without getting stuck in saddles, one must choose a decay rate $\beta<1$, contrary to convex setups where $\beta=1$ is generally optimal. We then add to the problem a signal to be recovered. In this setting, the dynamics decompose into two phases: an \emph{exploration} phase where the dynamics navigates through rough parts of the landscape, followed by a \emph{convergence} phase where the signal is detected and the dynamics enter a convex basin. In this case, it is optimal to keep a large learning rate during the exploration phase to escape the non-convex region as quickly as possible, then use the convex criterion $\beta=1$ to converge rapidly to the solution. Finally, we demonstrate that our conclusions hold in a common regression task involving neural networks.

CC · 統計量 · INFORMS · 損失函數（機器學習） · 經驗損失 ·

2022 年 2 月 9 日

Improving Computational Complexity in Statistical Models with Second-Order Information

Tongzheng Ren,Jiacheng Zhuo,Sujay Sanghavi,Nhat Ho

from arxiv, 36 pages, 2 figures. arXiv admin note: text overlap with arXiv:2110.07810

It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.

規范化的 · 近似 · 統計量 · 相同 · 統計理論 ·

2022 年 2 月 8 日

Local normal approximations and probability metric bounds for the matrix-variate $T$ distribution and its application to Hotelling's $T$ statistic

Frédéric Ouimet

from arxiv, 9 pages, 2 figures

In this paper, we develop local expansions for the ratio of the centered matrix-variate $T$ density to the centered matrix-variate normal density with the same covariances. The approximations are used to derive upper bounds on several probability metrics (such as the total variation and Hellinger distance) between the corresponding induced measures.

估計/估計量 · 最大似然估計 · 協方差矩陣 · 極大似然 · Performer ·

2014 年 10 月 9 日

A three domain covariance framework for EEG/MEG data

Beata Ro?,Fetsje Bijma,Mathisca de Gunst,Jan de Munck

from arxiv, 25 pages, 8 figures, 1 table

In this paper we introduce a covariance framework for the analysis of EEG and MEG data that takes into account observed temporal stationarity on small time scales and trial-to-trial variations. We formulate a model for the covariance matrix, which is a Kronecker product of three components that correspond to space, time and epochs/trials, and consider maximum likelihood estimation of the unknown parameter values. An iterative algorithm that finds approximations of the maximum likelihood estimates is proposed. We perform a simulation study to assess the performance of the estimator and investigate the influence of different assumptions about the covariance factors on the estimated covariance matrix and on its components. Apart from that, we illustrate our method on real EEG and MEG data sets. The proposed covariance model is applicable in a variety of cases where spontaneous EEG or MEG acts as source of noise and realistic noise covariance estimates are needed for accurate dipole localization, such as in evoked activity studies, or where the properties of spontaneous EEG or MEG are themselves the topic of interest, such as in combined EEG/fMRI experiments in which the correlation between EEG and fMRI signals is investigated.