好诱人的搜子好爽免费观看,GOGOGO高清在线播放,欧美亚洲熟妇一区二区三区

Independent $p$-dimensional vectors with independent complex or real valued entries such that $\mathbb{E} [\mathbf{x}_i] = \mathbf{0}$, ${\rm Var } (\mathbf{x}_i) = \mathbf{I}_p$, $i=1, \ldots,n$, let $\mathbf{T }_n$ be a $p \times p$ Hermitian nonnegative definite matrix and $f $ be a given function. We prove that an approriately standardized version of the stochastic process $ \big ( {\operatorname{tr}} ( f(\mathbf{B}_{n,t}) ) \big )_{t \in [t_0, 1]} $ corresponding to a linear spectral statistic of the sequential empirical covariance estimator $$ \big ( \mathbf{B}_{n,t} )_{t\in [ t_0 , 1]} = \Big ( \frac{1}{n} \sum_{i=1}^{\lfloor n t \rfloor} \mathbf{T }^{1/2}_n \mathbf{x}_i \mathbf{x}_i ^\star \mathbf{T }^{1/2}_n \Big)_{t\in [ t_0 , 1]} $$ converges weakly to a non-standard Gaussian process for $n,p\to\infty$. As an application we use these results to develop a novel approach for monitoring the sphericity assumption in a high-dimensional framework, even if the dimension of the underlying data is larger than the sample size.

相關內容

統計量

關注 3

穩健性 · 秩 · 確切的 · 分解的 · 正則化項 ·

2021 年 9 月 23 日

Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery

Lijun Ding,Liwei Jiang,Yudong Chen,Qing Qu,Zhihui Zhu

from arxiv, 75 pages, 3 figures

We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank. We consider the robust matrix factorization approach. We employ a robust $\ell_1$ loss function and deal with the challenge of the unknown rank by using an overspecified factored representation of the matrix variable. We then solve the associated nonconvex nonsmooth problem using a subgradient method with diminishing stepsizes. We show that under a regularity condition on the sensing matrices and corruption, which we call restricted direction preserving property (RDPP), even with rank overspecified, the subgradient method converges to the exact low-rank solution at a sublinear rate. Moreover, our result is more general in the sense that it automatically speeds up to a linear rate once the factor rank matches the unknown rank. On the other hand, we show that the RDPP condition holds under generic settings, such as Gaussian measurements under independent or adversarial sparse corruptions, where the result could be of independent interest. Both the exact recovery and the convergence rate of the proposed subgradient method are numerically verified in the overspecified regime. Moreover, our experiment further shows that our particular design of diminishing stepsize effectively prevents overfitting for robust recovery under overparameterized models, such as robust matrix sensing and learning robust deep image prior. This regularization effect is worth further investigation.

Weight · 低秩矩陣近似 · 近似 · Extensibility · 秩 ·

2021 年 9 月 22 日

Weighted Low Rank Matrix Approximation and Acceleration

Elena Tuzhilina,Trevor Hastie

Low-rank matrix approximation is one of the central concepts in machine learning, with applications in dimension reduction, de-noising, multivariate statistical methodology, and many more. A recent extension to LRMA is called low-rank matrix completion (LRMC). It solves the LRMA problem when some observations are missing and is especially useful for recommender systems. In this paper, we consider an element-wise weighted generalization of LRMA. The resulting weighted low-rank matrix approximation technique therefore covers LRMC as a special case with binary weights. WLRMA has many applications. For example, it is an essential component of GLM optimization algorithms, where an exponential family is used to model the entries of a matrix, and the matrix of natural parameters admits a low-rank structure. We propose an algorithm for solving the weighted problem, as well as two acceleration techniques. Further, we develop a non-SVD modification of the proposed algorithm that is able to handle extremely high-dimensional data. We compare the performance of all the methods on a small simulation example as well as a real-data application.

泛函 · 簇 · 似然 · 可約的 · 推斷 ·

2021 年 9 月 22 日

Functional Data Analysis for Extracting the Intrinsic Dimensionality of Spectra -- Application: Chemical Homogeneity in Open Cluster M67

Aarya A. Patil,Jo Bovy,Gwendolyn Eadie,Sebastian Jaimungal

from arxiv, 29 pages, 16 figures, 5 tables, Submitted to ApJ

High-resolution spectroscopic surveys of the Milky Way have entered the Big Data regime and have opened avenues for solving outstanding questions in Galactic Archaeology. However, exploiting their full potential is limited by complex systematics, whose characterization has not received much attention in modern spectroscopic analyses. In this work, we present a novel method to disentangle the component of spectral data space intrinsic to the stars from that due to systematics. Using functional principal component analysis on a sample of $18,933$ giant spectra from APOGEE, we find that the intrinsic structure above the level of observational uncertainties requires ${\approx}$10 Functional Principal Components (FPCs). Our FPCs can reduce the dimensionality of spectra, remove systematics, and impute masked wavelengths, thereby enabling accurate studies of stellar populations. To demonstrate the applicability of our FPCs, we use them to infer stellar parameters and abundances of 28 giants in the open cluster M67. We employ Sequential Neural Likelihood, a simulation-based Bayesian inference method that learns likelihood functions using neural density estimators, to incorporate non-Gaussian effects in spectral likelihoods. By hierarchically combining the inferred abundances, we limit the spread of the following elements in M67: $\mathrm{Fe} \lesssim 0.02$ dex; $\mathrm{C} \lesssim 0.03$ dex; $\mathrm{O}, \mathrm{Mg}, \mathrm{Si}, \mathrm{Ni} \lesssim 0.04$ dex; $\mathrm{Ca} \lesssim 0.05$ dex; $\mathrm{N}, \mathrm{Al} \lesssim 0.07$ dex (at 68% confidence). Our constraints suggest a lack of self-pollution by core-collapse supernovae in M67, which has promising implications for the future of chemical tagging to understand the star formation history and dynamical evolution of the Milky Way.

平穩的 · 估計/估計量 · 似然 · 近似 · 泛化理論 ·

2021 年 9 月 21 日

Posterior consistency for the spectral density of non-Gaussian stationary time series

Yifu Tang,Claudia Kirch,Jeong Eun Lee,Renate Meyer

Various nonparametric approaches for Bayesian spectral density estimation of stationary time series have been suggested in the literature, mostly based on the Whittle likelihood approximation. A generalization of this approximation has been proposed in Kirch et al. who prove posterior consistency for spectral density estimation in combination with the Bernstein-Dirichlet process prior for Gaussian time series. In this paper, we will extend the posterior consistency result to non-Gaussian time series by employing a general consistency theorem of Shalizi for dependent data and misspecified models. As a special case, posterior consistency for the spectral density under the Whittle likelihood as proposed by Choudhuri, Ghosal and Roy is also extended to non-Gaussian time series. Small sample properties of this approach are illustrated with several examples of non-Gaussian time series.

泛函 · 學成 · 大學 · 計算學習理論 · 機器學習 ·

2021 年 9 月 21 日

Learning low-degree functions from a logarithmic number of random queries

Alexandros Eskenazis,Paata Ivanisvili

We prove that for any integer $n\in\mathbb{N}$, $d\in\{1,\ldots,n\}$ and any $\varepsilon,\delta\in(0,1)$, a bounded function $f:\{-1,1\}^n\to[-1,1]$ of degree at most $d$ can be learned with probability at least $1-\delta$ and $L_2$-error $\varepsilon$ using $\log(\tfrac{n}{\delta})\,\varepsilon^{-d-1} C^{d^{3/2}\sqrt{\log d}}$ random queries for a universal finite constant $C>1$.

Weight · 分離的 · 圖 · CASE · SICOMP ·

2021 年 9 月 20 日

Clique-Based Separators for Geometric Intersection Graphs

Mark de Berg,Sándor Kisfaludi-Bak,Morteza Monemizadeh,Leonidas Theocharous

from arxiv, 23 pages, 8 figures

Let $F$ be a set of $n$ objects in the plane and let $G(F)$ be its intersection graph. A balanced clique-based separator of $G(F)$ is a set $S$ consisting of cliques whose removal partitions $G(F)$ into components of size at most $\delta n$, for some fixed constant $\delta<1$. The weight of a clique-based separator is defined as $\sum_{C\in S}\log (|C|+1)$. Recently De Berg et al. (SICOMP 2020) proved that if $S$ consists of convex fat objects, then $G(F)$ admits a balanced clique-based separator of weight $O(\sqrt{n})$. We extend this result in several directions, obtaining the following results. Map graphs admit a balanced clique-based separator of weight $O(\sqrt{n})$, which is tight in the worst case. Intersection graphs of pseudo-disks admit a balanced clique-based separator of weight $O(n^{2/3}\log n)$. If the pseudo-disks are polygonal and of total complexity $O(n)$ then the weight of the separator improves to $O(\sqrt{n}\log n)$. Intersection graphs of geodesic disks inside a simple polygon admit a balanced clique-based separator of weight $O(n^{2/3}\log n)$. Visibility-restricted unit-disk graphs in a polygonal domain with $r$ reflex vertices admit a balanced clique-based separator of weight $O(\sqrt{n}+r\log(n/r))$, which is tight in the worst case. These results immediately imply sub-exponential algorithms for MAXIMUM INDEPENDENT SET (and, hence, VERTEX COVER), for FEEDBACK VERTEX SET, and for $q$-COLORING for constant $q$ in these graph classes.

非凸 · Extensibility · 最優化 · Processing（編程語言） · MoDELS ·

2021 年 9 月 20 日

Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective

Kabir Aladin Chandrasekher,Ashwin Pananjady,Christos Thrampoulidis

We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of fitting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms for this task from a random initialization. In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian comparison theorems to derive a deterministic sequence that provides sharp upper and lower bounds on the error of the algorithm with sample-splitting. Crucially, this deterministic sequence accurately captures both the convergence rate of the algorithm and the eventual error floor in the finite-sample regime, and is distinct from the commonly used "population" sequence that results from taking the infinite-sample limit. We apply our general framework to derive several concrete consequences for parameter estimation in popular statistical models including phase retrieval and mixtures of regressions. Provided the sample size scales near-linearly in the dimension, we show sharp global convergence rates for both higher-order algorithms based on alternating updates and first-order algorithms based on subgradient descent. These corollaries, in turn, yield multiple consequences, including: (a) Proof that higher-order algorithms can converge significantly faster than their first-order counterparts (and sometimes super-linearly), even if the two share the same population update and (b) Intricacies in super-linear convergence behavior for higher-order algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensitive to the noise level in the problem. We complement these results with extensive numerical experiments, which show excellent agreement with our theoretical predictions.

查準率/準確率 · 相互獨立的 · 條件獨立的 · 相關系數 · 規范化的 ·

2021 年 9 月 20 日

Diagonal Nonlinear Transformations Preserve Structure in Covariance and Precision Matrices

Rebecca E Morrison,Ricardo Baptista,Estelle L Basor

For a multivariate normal distribution, the sparsity of the covariance and precision matrices encodes complete information about independence and conditional independence properties. For general distributions, the covariance and precision matrices reveal correlations and so-called partial correlations between variables, but these do not, in general, have any correspondence with respect to independence properties. In this paper, we prove that, for a certain class of non-Gaussian distributions, these correspondences still hold, exactly for the covariance and approximately for the precision. The distributions -- sometimes referred to as "nonparanormal" -- are given by diagonal transformations of multivariate normal random variables. We provide several analytic and numerical examples illustrating these results.

統計量 · 奇異的 · 矩陣論 · 留一法 · INFORMS ·

2021 年 9 月 18 日

Spectral Methods for Data Science: A Statistical Perspective

Yuxin Chen,Yuejie Chi,Jianqing Fan,Cong Ma

Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. While the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\ell_2$ perturbation analysis, we present a systematic $\ell_{\infty}$ and $\ell_{2,\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful "leave-one-out" analysis framework.

方差減小 · 估計/估計量 · 置信度 · 優化器 · 學成 ·

2018 年 4 月 25 日

Variance Reduction Methods for Sublinear Reinforcement Learning

Sham Kakade,Mengdi Wang,Lin F. Yang

from arxiv, Fixed a bug of a previous version

This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.