宁毅静平公主小说免费阅读_亚洲国产一区二区精品91_国语对白精品一区二区在线观看_99日本精品99久久久久久久_美女被操免费观看_国产系列在线爱A视频_五月丁香六月婷婷免费视频

Analyzing large samples of high-dimensional data under dependence is a challenging statistical problem as long time series may have change points, most importantly in the mean and the marginal covariances, for which one needs valid tests. Inference for large covariance matrices is especially difficult due to noise accumulation, resulting in singular estimates and poor power of related tests. The singularity of the sample covariance matrix in high dimensions can be overcome by considering a linear combination with a regular, more structured target matrix. This approach is known as shrinkage, and the target matrix is typically of diagonal form. In this paper, we consider covariance shrinkage towards structured nonparametric estimators of the bandable or Toeplitz type, respectively, aiming at improved estimation accuracy and statistical power of tests even under nonstationarity. We derive feasible Gaussian approximation results for bilinear projections of the shrinkage estimators which are valid under nonstationarity and dependence. These approximations especially enable us to formulate a statistical test for structural breaks in the marginal covariance structure of high-dimensional time series without restrictions on the dimension, and which is robust against nonstationarity of nuisance parameters. We show via simulations that shrinkage helps to increase the power of the proposed tests. Moreover, we suggest a data-driven choice of the shrinkage weights, and assess its performance by means of a Monte Carlo study. The results indicate that the proposed shrinkage estimator is superior for non-Toeplitz covariance structures close to fractional Gaussian noise.

相關內容

估計/估計量

關注 3

Copulas · 稀疏 · MoDELS · 損失 · 極大似然 ·

2022 年 12 月 24 日

Sparse M-estimators in semi-parametric copula models

Jean-David Fermanian,Benjamin Poignard

We study the large sample properties of sparse M-estimators in the presence of pseudo-observations. Our framework covers a broad class of semi-parametric copula models, for which the marginal distributions are unknown and replaced by their empirical counterparts. It is well known that the latter modification significantly alters the limiting laws compared to usual M-estimation. We establish the consistency and the asymptotic normality of our sparse penalized M-estimator and we prove the asymptotic oracle property with pseudo-observations, possibly in the case when the number of parameters is diverging. Our framework allows to manage copula-based loss functions that are potentially unbounded. Additionally, we state the weak limit of multivariate rank statistics for an arbitrary dimension and the weak convergence of empirical copula processes indexed by maps. We apply our inference method to Canonical Maximum Likelihood losses with Gaussian copulas, mixtures of copulas or conditional copulas. The theoretical results are illustrated by two numerical experiments.

子采樣 · 估計/估計量 · 可約的 · 大數據 · Extensibility ·

2022 年 12 月 23 日

Balanced Subsampling for Big Data with Categorical Covariates

Lin Wang

The use and analysis of massive data are challenging due to the high storage and computational cost. Subsampling algorithms are popular to downsize the data volume and reduce the computational burden. Existing subsampling approaches focus on data with numerical covariates. Although big data with categorical covariates are frequently encountered in many disciplines, the subsampling plan has not been well established. In this paper, we propose a balanced subsampling approach for reducing data with categorical covariates. The selected subsample achieves a combinatorial balance among values of covariates and therefore enjoys three desired merits. First, a balanced subsample is nonsingular and thus allows the estimation of all parameters in ANOVA regression. Second, it provides the optimal parameter estimation in the sense of minimizing the generalized variance of the estimated parameters. Third, the model trained on a balanced subsample provides robust predictions in the sense of minimizing the worst-case prediction error. We demonstrate the usefulness of the balanced subsampling over existing data reduction methods in extensive simulation studies and a real-world application.

Performer · 傳感器 · 圖 · 最大間隔 · EASE ·

2022 年 12 月 23 日

Recovery of Missing Sensor Data by Reconstructing Time-varying Graph Signals

Anindya Mondal,Mayukhmali Das,Aditi Chatterjee,Palaniandavar Venkateswaran

from arxiv, Five pages, two figures, 2022 30th European Signal Processing Conference (EUSIPCO). Published version available at: //ieeexplore.ieee.org/document/9909940

Wireless sensor networks are among the most promising technologies of the current era because of their small size, lower cost, and ease of deployment. With the increasing number of wireless sensors, the probability of generating missing data also rises. This incomplete data could lead to disastrous consequences if used for decision-making. There is rich literature dealing with this problem. However, most approaches show performance degradation when a sizable amount of data is lost. Inspired by the emerging field of graph signal processing, this paper performs a new study of a Sobolev reconstruction algorithm in wireless sensor networks. Experimental comparisons on several publicly available datasets demonstrate that the algorithm surpasses multiple state-of-the-art techniques by a maximum margin of 54%. We further show that this algorithm consistently retrieves the missing data even during massive data loss situations.

離散化 · 線性的 · 易處理的 · Continuity · 講稿 ·

2022 年 12 月 23 日

A projection-based, semi-implicit time-stepping approach for the Cahn-Hilliard Navier-Stokes equations on adaptive octree meshes

Makrand A. Khanwale,Kumar Saurabh,Masado Ishii,Hari Sundar,James A. Rossmanith,Baskar Ganapathysubramanian

from arxiv, 51 pages, 18 Figures, version accepted for publication Journal of Computational Physics. arXiv admin note: text overlap with arXiv:2009.06628

The Cahn-Hilliard Navier-Stokes (CHNS) system provides a computationally tractable model that can be used to effectively capture interfacial dynamics in two-phase fluid flows. In this work, we present a semi-implicit, projection-based finite element framework for solving the CHNS system. We use a projection-based semi-implicit time discretization for the Navier-Stokes equation and a fully-implicit time discretization for the Cahn-Hilliard equation. We use a conforming continuous Galerkin (cG) finite element method in space equipped with a residual-based variational multiscale (RBVMS) formulation. Pressure is decoupled using a projection step, which results in two linear positive semi-definite systems for velocity and pressure, instead of the saddle point system of a pressure-stabilized method. All the linear systems are solved using an efficient and scalable algebraic multigrid (AMG) method. We deploy this approach on a massively parallel numerical implementation using parallel octree-based adaptive meshes. The overall approach allows the use of relatively large time steps with much faster time-to-solve than similar fully-implicit methods. We present comprehensive numerical experiments showing detailed comparisons with results from the literature for canonical cases, including the single bubble rise and Rayleigh-Taylor instability.

估計/估計量 · MoDELS · Performer · Analysis · 分解的 ·

2022 年 12 月 23 日

Sufficient Dimension Reduction for Populations with Structured Heterogeneity

Jared D. Huling,Menggang Yu

A key challenge in building effective regression models for large and diverse populations is accounting for patient heterogeneity. An example of such heterogeneity is in health system risk modeling efforts where different combinations of comorbidities fundamentally alter the relationship between covariates and health outcomes. Accounting for heterogeneity arising combinations of factors can yield more accurate and interpretable regression models. Yet, in the presence of high dimensional covariates, accounting for this type of heterogeneity can exacerbate estimation difficulties even with large sample sizes. To handle these issues, we propose a flexible and interpretable risk modeling approach based on semiparametric sufficient dimension reduction. The approach accounts for patient heterogeneity, borrows strength in estimation across related subpopulations to improve both estimation efficiency and interpretability, and can serve as a useful exploratory tool or as a powerful predictive model. In simulated examples, we show that our approach often improves estimation performance in the presence of heterogeneity and is quite robust to deviations from its key underlying assumptions. We demonstrate our approach in an analysis of hospital admission risk for a large health system and demonstrate its predictive power when tested on further follow-up data.

Analysis · 貝葉斯推斷 · 近似 · 推斷 · 可約的 ·

2022 年 12 月 23 日

Hyper-differential sensitivity analysis in the context of Bayesian inference applied to ice-sheet problems

William Reese,Joseph Hart,Bart van Bloemen Waanders,Mauro Perego,John Jakeman,Arvind Saibaba

Inverse problems constrained by partial differential equations (PDEs) play a critical role in model development and calibration. In many applications, there are multiple uncertain parameters in a model which must be estimated. Although the Bayesian formulation is attractive for such problems, computational cost and high dimensionality frequently prohibit a thorough exploration of the parametric uncertainty. A common approach is to reduce the dimension by fixing some parameters (which we will call auxiliary parameters) to a best estimate and use techniques from PDE-constrained optimization to approximate properties of the Bayesian posterior distribution. For instance, the maximum a posteriori probability (MAP) and the Laplace approximation of the posterior covariance can be computed. In this article, we propose using hyper-differential sensitivity analysis (HDSA) to assess the sensitivity of the MAP point to changes in the auxiliary parameters. We establish an interpretation of HDSA as correlations in the posterior distribution. Our proposed framework is demonstrated on the inversion of bedrock topography for the Greenland ice sheet with uncertainties arising from the basal friction coefficient and climate forcing (ice accumulation rate)

估計/估計量 · 方差 · TOOLS · 情景 · 方陣 ·

2022 年 12 月 23 日

Graphical tools for selecting conditional instrumental sets

Leonard Henckel,Martin Buttensch?n

from arxiv, 43 pages, 4 figures

We consider the efficient estimation of total causal effects in the presence of unmeasured confounding using conditional instrumental sets. Specifically, we consider the two-stage least squares estimator in the setting of a linear structural equation model with correlated errors that is compatible with a known acyclic directed mixed graph. To set the stage for our results, we characterize the class of valid conditional instrumental sets that yield consistent two-stage least squares estimators for the target total effect and derive a new asymptotic variance formula for these estimators. Equipped with these results, we provide three graphical tools for selecting more efficient valid conditional instrumental sets. First, a graphical criterion that for certain pairs of valid conditional instrumental sets identifies which of the two corresponding estimators has the smaller asymptotic variance. Second, an algorithm that greedily adds covariates that reduce the asymptotic variance to a given valid conditional instrumental set. Third, a valid conditional instrumental set for which the corresponding estimator has the smallest asymptotic variance that can be ensured with a graphical criterion.

統計量 · 近似 · 正則化項 · 估計/估計量 · Analysis ·

2022 年 12 月 23 日

Two-Sample Test for High-Dimensional Covariance Matrices: a normal-reference approach

Jin-Ting Zhang,Jingyi Wang,Tianming Zhu

Testing the equality of the covariance matrices of two high-dimensional samples is a fundamental inference problem in statistics. Several tests have been proposed but they are either too liberal or too conservative when the required assumptions are not satisfied which attests that they are not always applicable in real data analysis. To overcome this difficulty, a normal-reference test is proposed and studied in this paper. It is shown that under some regularity conditions and the null hypothesis, the proposed test statistic and a chi-square-type mixture have the same limiting distribution. It is then justified to approximate the null distribution of the proposed test statistic using that of the chi-square-type mixture. The distribution of the chi-square-type mixture can be well approximated using a three-cumulant matched chi-square-approximation with its approximation parameters consistently estimated from the data. The asymptotic power of the proposed test under a local alternative is also established. Simulation studies and a real data example demonstrate that in terms of size control, the proposed test outperforms the existing competitors substantially.

MoDELS · GM · Copulas · GROUP · 圖 ·

2022 年 12 月 23 日

Copula Graphical Models for Heterogeneous Mixed Data

Sjoerd Hermes,Joost van Heerwaarden,Pariya Behrouzi

This article proposes a graphical model that handles mixed-type, multi-group data. The motivation for such a model originates from real-world observational data, which often contain groups of samples obtained under heterogeneous conditions in space and time, potentially resulting in differences in network structure among groups. Therefore, the i.i.d. assumption is unrealistic, and fitting a single graphical model on all data results in a network that does not accurately represent the between group differences. In addition, real-world observational data is typically of mixed discrete-and-continuous type, violating the Gaussian assumption that is typical of graphical models, which leads to the model being unable to adequately recover the underlying graph structure. The proposed model takes into account these properties of data, by treating observed data as transformed latent Gaussian data, by means of the Gaussian copula, and thereby allowing for the attractive properties of the Gaussian distribution such as estimating the optimal number of model parameter using the inverse covariance matrix. The multi-group setting is addressed by jointly fitting a graphical model for each group, and applying the fused group penalty to fuse similar graphs together. In an extensive simulation study, the proposed model is evaluated against alternative models, where the proposed model is better able to recover the true underlying graph structure for different groups. Finally, the proposed model is applied on real production-ecological data pertaining to on-farm maize yield in order to showcase the added value of the proposed method in generating new hypotheses for production ecologists.

統計量 · 結構化學習 · 圖 · Learning · 相似度 ·

2022 年 12 月 23 日

Robust Graph Structure Learning via Multiple Statistical Tests

Yaohua Wang,FangYi Zhang,Ming Lin,Senzhang Wang,Xiuyu Sun,Rong Jin

from arxiv, Accepted by the NeurIPS 2022. Homepage: //thomas-wyh.github.io/

Graph structure learning aims to learn connectivity in a graph from data. It is particularly important for many computer vision related tasks since no explicit graph structure is available for images for most cases. A natural way to construct a graph among images is to treat each image as a node and assign pairwise image similarities as weights to corresponding edges. It is well known that pairwise similarities between images are sensitive to the noise in feature representations, leading to unreliable graph structures. We address this problem from the viewpoint of statistical tests. By viewing the feature vector of each node as an independent sample, the decision of whether creating an edge between two nodes based on their similarity in feature representation can be thought as a ${\it single}$ statistical test. To improve the robustness in the decision of creating an edge, multiple samples are drawn and integrated by ${\it multiple}$ statistical tests to generate a more reliable similarity measure, consequentially more reliable graph structure. The corresponding elegant matrix form named $\mathcal{B}\textbf{-Attention}$ is designed for efficiency. The effectiveness of multiple tests for graph structure learning is verified both theoretically and empirically on multiple clustering and ReID benchmark datasets. Source codes are available at //github.com/Thomas-wyh/B-Attention.