亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Principal component analysis (PCA) is a most frequently used statistical tool in almost all branches of data science. However, like many other statistical tools, there is sometimes the risk of misuse or even abuse. In this paper, we highlight possible pitfalls in using the theoretical results of PCA based on the assumption of independent data when the data are time series. For the latter, we state with proof a central limit theorem of the eigenvalues and eigenvectors (loadings), give direct and bootstrap estimation of their asymptotic covariances, and assess their efficacy via simulation. Specifically, we pay attention to the proportion of variation, which decides the number of principal components (PCs), and the loadings, which help interpret the meaning of PCs. Our findings are that while the proportion of variation is quite robust to different dependence assumptions, the inference of PC loadings requires careful attention. We initiate and conclude our investigation with an empirical example on portfolio management, in which the PC loadings play a prominent role. It is given as a paradigm of correct usage of PCA for time series data.

相關內容

在統計中(zhong),主成分分析(PCA)是(shi)(shi)一(yi)種通過(guo)(guo)最大(da)化(hua)每個維(wei)(wei)(wei)度(du)的(de)(de)方差來(lai)將較高(gao)維(wei)(wei)(wei)度(du)空間(jian)中(zhong)的(de)(de)數(shu)據投影到較低維(wei)(wei)(wei)度(du)空間(jian)中(zhong)的(de)(de)方法。給定二維(wei)(wei)(wei),三(san)維(wei)(wei)(wei)或更高(gao)維(wei)(wei)(wei)空間(jian)中(zhong)的(de)(de)點(dian)集合(he)(he),可(ke)以將“最佳擬合(he)(he)”線(xian)定義為最小(xiao)化(hua)從點(dian)到線(xian)的(de)(de)平均平方距離(li)的(de)(de)線(xian)。可(ke)以從垂直(zhi)于第一(yi)條直(zhi)線(xian)的(de)(de)方向(xiang)(xiang)類似地選(xuan)擇(ze)下一(yi)條最佳擬合(he)(he)線(xian)。重復此過(guo)(guo)程(cheng)會產生一(yi)個正交的(de)(de)基(ji)礎,其中(zhong)數(shu)據的(de)(de)不同單個維(wei)(wei)(wei)度(du)是(shi)(shi)不相關的(de)(de)。 這些基(ji)向(xiang)(xiang)量稱為主成分。

This article introduces a novel nonparametric methodology for Generalized Linear Models which combines the strengths of the binary regression and latent variable formulations for categorical data, while overcoming their disadvantages. Requiring minimal assumptions, it extends recently published parametric versions of the methodology and generalizes it. If the underlying data generating process is asymmetric, it gives uniformly better prediction and inference performance over the parametric formulation. Furthermore, it introduces a new classification statistic utilizing which I show that overall, it has better model fit, inference and classification performance than the parametric version, and the difference in performance is statistically significant especially if the data generating process is asymmetric. In addition, the methodology can be used to perform model diagnostics for any model specification. This is a highly useful result, and it extends existing work for categorical model diagnostics broadly across the sciences. The mathematical results also highlight important new findings regarding the interplay of statistical significance and scientific significance. Finally, the methodology is applied to various real-world datasets to show that it may outperform widely used existing models, including Random Forests and Deep Neural Networks with very few iterations.

A particularly challenging context for dimensionality reduction is multivariate circular data, i.e., data supported on a torus. Such kind of data appears, e.g., in the analysis of various phenomena in ecology and astronomy, as well as in molecular structures. This paper introduces Scaled Torus Principal Component Analysis (ST-PCA), a novel approach to perform dimensionality reduction with toroidal data. ST-PCA finds a data-driven map from a torus to a sphere of the same dimension and a certain radius. The map is constructed with multidimensional scaling to minimize the discrepancy between pairwise geodesic distances in both spaces. ST-PCA then resorts to principal nested spheres to obtain a nested sequence of subspheres that best fits the data, which can afterwards be inverted back to the torus. Numerical experiments illustrate how ST-PCA can be used to achieve meaningful dimensionality reduction on low-dimensional torii, particularly with the purpose of clusters separation, while two data applications in astronomy (three-dimensional torus) and molecular biology (on a seven-dimensional torus) show that ST-PCA outperforms existing methods for the investigated datasets.

In this article, we develop and analyse a new spectral method to solve the semi-classical Schr\"odinger equation based on the Gaussian wave-packet transform (GWPT) and Hagedorn's semi-classical wave-packets (HWP). The GWPT equivalently recasts the highly oscillatory wave equation as a much less oscillatory one (the $w$ equation) coupled with a set of ordinary differential equations governing the dynamics of the so-called GWPT parameters. The Hamiltonian of the $ w $ equation consists of a quadratic part and a small non-quadratic perturbation, which is of order $ \mathcal{O}(\sqrt{\varepsilon }) $, where $ \varepsilon\ll 1 $ is the rescaled Planck's constant. By expanding the solution of the $ w $ equation as a superposition of Hagedorn's wave-packets, we construct a spectral method while the $ \mathcal{O}(\sqrt{\varepsilon}) $ perturbation part is treated by the Galerkin approximation. This numerical implementation of the GWPT avoids imposing artificial boundary conditions and facilitates rigorous numerical analysis. For arbitrary dimensional cases, we establish how the error of solving the semi-classical Schr\"odinger equation with the GWPT is determined by the errors of solving the $ w $ equation and the GWPT parameters. We prove that this scheme has the spectral convergence with respect to the number of Hagedorn's wave-packets in one dimension. Extensive numerical tests are provided to demonstrate the properties of the proposed method.

Modeling and drawing inference on the joint associations between single nucleotide polymorphisms and a disease has sparked interest in genome-wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the de-biased lasso approach (van de Geer et al. 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this "large $n$, diverging $p$" scenario, we propose an alternative de-biased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original de-biased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposed refined de-biased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large scale hospital-based epidemiology cohort study, that investigates the joint effects of genetic variants on lung cancer risks.

Topic models provide a useful text-mining tool for learning, extracting and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this paper, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets.

Research on quantum technology spans multiple disciplines: physics, computer science, engineering, and mathematics. The objective of this manuscript is to provide an accessible introduction to this emerging field for economists that is centered around quantum computing and quantum money. We proceed in three steps. First, we discuss basic concepts in quantum computing and quantum communication, assuming knowledge of linear algebra and statistics, but not of computer science or physics. This covers fundamental topics, such as qubits, superposition, entanglement, quantum circuits, oracles, and the no-cloning theorem. Second, we provide an overview of quantum money, an early invention of the quantum communication literature that has recently been partially implemented in an experimental setting. One form of quantum money offers the privacy and anonymity of physical cash, the option to transact without the involvement of a third party, and the efficiency and convenience of a debit card payment. Such features cannot be achieved in combination with any other form of money. Finally, we review all existing quantum speedups that have been identified for algorithms used to solve and estimate economic models. This includes function approximation, linear systems analysis, Monte Carlo simulation, matrix inversion, principal component analysis, linear regression, interpolation, numerical differentiation, and true random number generation. We also discuss the difficulty of achieving quantum speedups and comment on common misconceptions about what is achievable with quantum computing.

We address a three-tier numerical framework based on manifold learning for the forecasting of high-dimensional time series. At the first step, we embed the time series into a reduced low-dimensional space using a nonlinear manifold learning algorithm such as Locally Linear Embedding and Diffusion Maps. At the second step, we construct reduced-order regression models on the manifold, in particular Multivariate Autoregressive (MVAR) and Gaussian Process Regression (GPR) models, to forecast the embedded dynamics. At the final step, we lift the embedded time series back to the original high-dimensional space using Radial Basis Functions interpolation and Geometric Harmonics. For our illustrations, we test the forecasting performance of the proposed numerical scheme with four sets of time series: three synthetic stochastic ones resembling EEG signals produced from linear and nonlinear stochastic models with different model orders, and one real-world data set containing daily time series of 10 key foreign exchange rates (FOREX) spanning the time period 03/09/2001-29/10/2020. The forecasting performance of the proposed numerical scheme is assessed using the combinations of manifold learning, modelling and lifting approaches. We also provide a comparison with the Principal Component Analysis algorithm as well as with the naive random walk model and the MVAR and GPR models trained and implemented directly in the high-dimensional space.

Within the vast body of statistical theory developed for binary classification, few meaningful results exist for imbalanced classification, in which data are dominated by samples from one of the two classes. Existing theory faces at least two main challenges. First, meaningful results must consider more complex performance measures than classification accuracy. To address this, we characterize a novel generalization of the Bayes-optimal classifier to any performance metric computed from the confusion matrix, and we use this to show how relative performance guarantees can be obtained in terms of the error of estimating the class probability function under uniform ($\mathcal{L}_\infty$) loss. Second, as we show, optimal classification performance depends on certain properties of class imbalance that have not previously been formalized. Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance. We analyze how Uniform Class Imbalance influences optimal classifier performance and show that it necessitates different classifier behavior than other types of class imbalance. We further illustrate these two contributions in the case of $k$-nearest neighbor classification, for which we develop novel guarantees. Together, these results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.

The piecewise constant Mumford-Shah (PCMS) model and the Rudin-Osher-Fatemi (ROF) model are two of the most famous variational models in image segmentation and image restoration, respectively. They have ubiquitous applications in image processing. In this paper, we explore the linkage between these two important models. We prove that for two-phase segmentation problem the optimal solution of the PCMS model can be obtained by thresholding the minimizer of the ROF model. This linkage is still valid for multiphase segmentation under mild assumptions. Thus it opens a new segmentation paradigm: image segmentation can be done via image restoration plus thresholding. This new paradigm, which circumvents the innate non-convex property of the PCMS model, therefore improves the segmentation performance in both efficiency (much faster than state-of-the-art methods based on PCMS model, particularly when the phase number is high) and effectiveness (producing segmentation results with better quality) due to the flexibility of the ROF model in tackling degraded images, such as noisy images, blurry images or images with information loss. As a by-product of the new paradigm, we derive a novel segmentation method, coined thresholded-ROF (T-ROF) method, to illustrate the virtue of manipulating image segmentation through image restoration techniques. The convergence of the T-ROF method under certain conditions is proved, and elaborate experimental results and comparisons are presented.

We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.

北京阿比特科技有限公司