亚洲黄色网站不卡免费,日韩少妇人妻VS一区二区三区

Signal region detection is one of the challenging problems in modern statistics and has broad applications especially in genetic studies. We propose a novel approach effectively coupling with high-dimensional test, which is distinct from existing methods based on scan or knockoff statistics. The idea is to conduct binary segmentation with re-search and arrangement based on a sequence of dynamic tests to increase detection accuracy and reduce computation. Theoretical and empirical studies demonstrate that our approach enjoys favorable theoretical guarantees with fewer restrictions and exhibits superior numerical performance with faster computation. Compared to scan-based methods, our procedure is capable of detecting shorter or longer regions with unbalanced signal strengths while allowing for more dependence structures. Relative to the knockoff framework that only controls false discovery rate, our approach attains higher detection accuracy while controlling the family-wise error rate. Utilizing the UK Biobank data to identify the genetic regions related to breast cancer, we confirm previous findings and meanwhile, identify a number of new regions which suggest strong association with risk of breast cancer and deserve further investigation.

相關內容

binary

關注 1

估計/估計量 · 統計量 · 線性回歸 · 線性的 · 無偏估計 ·

2023 年 6 月 28 日

Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

Fei Xue,Rong Ma,Hongzhe Li

from arxiv, V2: 40 pages, 2 figures. Accepted at Statistica Sinica

Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building upon an innovative projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.

Analysis · Continuity · 列 · 行 · 準則 ·

2023 年 6 月 28 日

Analysis of the limiting spectral distribution of large dimensional General information-plus-noise type matrices

Huanchao Zhou,Jiang Hu,Zhidong Bai,Jack W. Silverstein

In this paper, we derive the analytical behavior of the limiting spectral distribution of non-central covariance matrices of the "general information-plus-noise" type, as studied in [14]. Through the equation defining its Stieltjes transform, it is shown that the limiting distribution has a continuous derivative away from zero, the derivative being analytic wherever it is positive, and we show the determination criterion for its support. We also extend the result in [14] to allow for all possible ratios of row to column of the underlying random matrix.

Extensibility · 統計量 · 得分 · LD · 估計/估計量 ·

2023 年 6 月 27 日

High-dimensional statistical inference for linkage disequilibrium score regression and its cross-ancestry extensions

Fei Xue,Bingxin Zhao

from arxiv, 13 figures

Linkage disequilibrium score regression (LDSC) has emerged as an essential tool for genetic and genomic analyses of complex traits, utilizing high-dimensional data derived from genome-wide association studies (GWAS). LDSC computes the linkage disequilibrium (LD) scores using an external reference panel, and integrates the LD scores with only summary data from the original GWAS. In this paper, we investigate LDSC within a fixed-effect data integration framework, underscoring its ability to merge multi-source GWAS data and reference panels. In particular, we take account of the genome-wide dependence among the high-dimensional GWAS summary statistics, along with the block-diagonal dependence pattern in estimated LD scores. Our analysis uncovers several key factors of both the original GWAS and reference panel datasets that determine the performance of LDSC. We show that it is relatively feasible for LDSC-based estimators to achieve asymptotic normality when applied to genome-wide genetic variants (e.g., in genetic variance and covariance estimation), whereas it becomes considerably challenging when we focus on a much smaller subset of genetic variants (e.g., in partitioned heritability analysis). Moreover, by modeling the disparities in LD patterns across different populations, we unveil that LDSC can be expanded to conduct cross-ancestry analyses using data from distinct global populations (such as European and Asian). We validate our theoretical findings through extensive numerical evaluations using real genetic data from the UK Biobank study.

binary · 異方差 · 成比例 · 確切的 · 有向 ·

2023 年 6 月 27 日

A non-parametric approach to detect patterns in binary sequences

Anushka De

from arxiv, 7 pages, 6 figures,Submitted to Biometrica

To determine any pattern in an ordered binary sequence of wins and losses of a player over a period of time, the Runs Test may show results contradictory to the intuition visualised by scatter plots of win proportions over time. We design a test suitable for this purpose by computing the gaps between two consecutive wins and then using exact binomial tests and non-parametric tests like Kendall's Tau and Siegel-Tukey's test for scale problem for determination of heteroscedastic patterns and direction of the occurrence of wins. Further modifications suggested by Jan Vegelius(1982) have been applied in the Siegel Tukey test to adjust for tied ranks.

社區發現 · Networking · 結點 · 講稿 · 統計量 ·

2023 年 6 月 27 日

Network-Adjusted Covariates for Community Detection

Yaofang Hu,Wanjie Wang

from arxiv, 48 pages

Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated.

統計量 · 泛函 · 數據生成過程 · Analysis · Processing（編程語言） ·

2023 年 6 月 27 日

General multiple tests for functional data

Merle Munko,Marc Ditzhaus,Markus Pauly,?ukasz Smaga,Jin-Ting Zhang

While there exists several inferential methods for analyzing functional data in factorial designs, there is a lack of statistical tests that are valid (i) in general designs, (ii) under non-restrictive assumptions on the data generating process and (iii) allow for coherent post-hoc analyses. In particular, most existing methods assume Gaussianity or equal covariance functions across groups (homoscedasticity) and are only applicable for specific study designs that do not allow for evaluation of interactions. Moreover, all available strategies are only designed for testing global hypotheses and do not directly allow a more in-depth analysis of multiple local hypotheses. To address the first two problems (i)-(ii), we propose flexible integral-type test statistics that are applicable in general factorial designs under minimal assumptions on the data generating process. In particular, we neither postulate homoscedasticity nor Gaussianity. To approximate the statistics' null distribution, we adopt a resampling approach and validate it methodologically. Finally, we use our flexible testing framework to (iii) infer several local null hypotheses simultaneously. To allow for powerful data analysis, we thereby take the complex dependencies of the different local test statistics into account. In extensive simulations we confirm that the new methods are flexibly applicable. Two illustrate data analyses complete our study. The new testing procedures are implemented in the R package multiFANOVA, which will be available on CRAN soon.

內積 · Atom（文本編輯器） · 查詢向量 · 向量化 · 總回報 ·

2023 年 6 月 27 日

Faster Maximum Inner Product Search in High Dimensions

Mo Tiwari,Ryan Kang,Je-Yong Lee,Donghyun Lee,Chris Piech,Sebastian Thrun,Ilan Shomorony,Martin Jinye Zhang

from arxiv, 24 pages

Maximum Inner Product Search (MIPS) is a ubiquitous task in machine learning applications such as recommendation systems. Given a query vector and $n$ atom vectors in $d$-dimensional space, the goal of MIPS is to find the atom that has the highest inner product with the query vector. Existing MIPS algorithms scale at least as $O(\sqrt{d})$, which becomes computationally prohibitive in high-dimensional settings. In this work, we present BanditMIPS, a novel randomized MIPS algorithm whose complexity is independent of $d$. BanditMIPS estimates the inner product for each atom by subsampling coordinates and adaptively evaluates more coordinates for more promising atoms. The specific adaptive sampling strategy is motivated by multi-armed bandits. We provide theoretical guarantees that BanditMIPS returns the correct answer with high probability, while improving the complexity in $d$ from $O(\sqrt{d})$ to $O(1)$. We also perform experiments on four synthetic and real-world datasets and demonstrate that BanditMIPS outperforms prior state-of-the-art algorithms. For example, in the Movie Lens dataset ($n$=4,000, $d$=6,000), BanditMIPS is 20$\times$ faster than the next best algorithm while returning the same answer. BanditMIPS requires no preprocessing of the data and includes a hyperparameter that practitioners may use to trade off accuracy and runtime. We also propose a variant of our algorithm, named BanditMIPS-$\alpha$, which achieves further speedups by employing non-uniform sampling across coordinates. Finally, we demonstrate how known preprocessing techniques can be used to further accelerate BanditMIPS, and discuss applications to Matching Pursuit and Fourier analysis.

QUIC · tuning · Networking · 可辨認的 · Things ·

2023 年 6 月 26 日

An Experimental Investigation of Tuning QUIC-Based Publish-Subscribe Architectures in IoT

Darius Saif,Ashraf Matrawy

There has been growing interest in using the QUIC transport protocol for the Internet of Things (IoT). In lossy and high latency networks, QUIC outperforms TCP and TLS. Since IoT greatly differs from traditional networks in terms of architecture and resources, IoT specific parameter tuning has proven to be of significance. While RFC 9006 offers a guideline for tuning TCP within IoT, we have not found an equivalent for QUIC. This paper is the first of our knowledge to contribute empirically based insights towards tuning QUIC for IoT. We improved our pure HTTP/3 publish-subscribe architecture and rigorously benchmarked it against an alternative: MQTT-over-QUIC. To investigate the impact of transport-layer parameters, we ran both applications on Raspberry Pi Zero hardware. Eight metrics were collected while emulating different network conditions and message payloads. We enumerate the points we experimentally identified (notably, relating to authentication, MAX\_STREAM messages, and timers) and elaborate on how they can be tuned to improve resource consumption and performance. Our application offered lower latency than MQTT-over-QUIC with slightly higher resource consumption, making it preferable for reliable time-sensitive dissemination of information.

統計量 · 圖像降噪 · 分離的 · 描述符 · 去噪 ·

2023 年 6 月 26 日

Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures

Bruno Régaldo-Saint Blancard,Michael Eickenberg

from arxiv, 11+12 pages, 5+5 figures, code: //github.com/bregaldo/stat_comp_sep

Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.

異常檢測 · Learning · MoDELS · 深度學習 · Taxonomy ·

2022 年 11 月 9 日

Deep Learning for Time Series Anomaly Detection: A Survey

Zahra Zamanzadeh Darban,Geoffrey I. Webb,Shirui Pan,Charu C. Aggarwal,Mahsa Salehi

Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.