韩国成年性午夜免费视频_日本成年黄色一区二区三区_巨大乳首揉乳搾_美女高潮抽搐潮喷白浆视频_欧美视频网站操逼_亚洲欧美一区二区成人精品_一进一出黄色视频又免费又刺激又

We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate. Our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite collection of weight-indexed log-rank tests, with weight functions belonging to a reproducing kernel Hilbert space (RKHS) of functions; and second, as the norm of the difference of embeddings of certain finite measures into the RKHS, similar to the Hilbert-Schmidt Independence Criterion (HSIC) test-statistic. We study the asymptotic properties of the test, finding sufficient conditions to ensure our test correctly rejects the null hypothesis under any alternative. The test statistic can be computed straightforwardly, and the rejection threshold is obtained via an asymptotically consistent Wild Bootstrap procedure. Extensive investigations on both simulated and real data suggest that our testing procedure generally performs better than competing approaches in detecting complex non-linear dependence.

相關內容

相互(hu)獨立(li)的(de)

關注 1

Weight · 正交 · 估計/估計量 · 泛函 · 判別器 ·

2022 年 1 月 26 日

Custom Orthogonal Weight functions (COWs) for Event Classification

Hans Dembinski,Matthew Kenzie,Christoph Langenbruch,Michael Schmelling

from arxiv, 18 pages, 16 figures, for associated software tools see //pypi.org/project/sweights/

A common problem in data analysis is the separation of signal and background. We revisit and generalise the so-called $sWeights$ method, which allows one to calculate an empirical estimate of the signal density of a control variable using a fit of a mixed signal and background model to a discriminating variable. We show that $sWeights$ are a special case of a larger class of Custom Orthogonal Weight functions (COWs), which can be applied to a more general class of problems in which the discriminating and control variables are not necessarily independent and still achieve close to optimal performance. We also investigate the properties of parameters estimated from fits of statistical models to $sWeights$ and provide closed formulas for the asymptotic covariance matrix of the fitted parameters. To illustrate our findings, we discuss several practical applications of these techniques.

規范化的 · 頻率主義學派 · CASE · 錯誤率 · 推斷 ·

2022 年 1 月 26 日

Behaviour of FWER in Normal Distributions

Monitirtha Dey

from arxiv, 12 pages, 2 figures

Familywise error rate (FWER) has been a cornerstone in simultaneous inference for decades, and the classical Bonferroni method has been one of the most prominent frequentist approaches for controlling FWER. The present article studies the behavior of the FWER for Bonferroni procedure in a multiple testing problem. We establish upper bounds on FWER for Bonferroni method under the equicorrelated and general normal setups in nonasymptotic case.

簇 · 聚類分析 · 統計量 · 劃分 · MoDELS ·

2022 年 1 月 26 日

Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis

Jan Greve,Bettina Grün,Gertraud Malsiner-Walli,Sylvia Frühwirth-Schnatter

Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work introduces an approach to compute descriptive statistics of the prior on the partitions for three selected Bayesian mixture models developed in the areas of Bayesian finite mixtures and Bayesian nonparametrics. The proposed methodology involves computationally efficient enumeration of the prior on the number of clusters in-sample (termed as ``data clusters'') and determining the first two prior moments of symmetric additive statistics characterising the partitions. The accompanying reference implementation is made available in the R package 'fipp'. Finally, we illustrate the proposed methodology through comparisons and also discuss the implications for prior elicitation in applications.

經驗分布 · 泛函 · Continuity · 蒙特卡羅 · CASE ·

2022 年 1 月 25 日

Multi-purpose open-end monitoring procedures for multivariate observations based on the empirical distribution function

Mark Holmes,Ivan Kojadinovic,Alex Verhoijsen

from arxiv, 33 pages, 4 tables, 6 figures

We propose nonparametric open-end sequential testing procedures that can detect all types of changes in the contemporary distribution function of multivariate observations. Their asymptotic properties are theoretically investigated under stationarity and under alternatives to stationarity. Monte Carlo experiments reveal their good finite-sample behavior in the case of continuous univariate observations. A short data example concludes the work.

估計/估計量 · 穩健性 · 情景 · 未標記 · 監督 ·

2022 年 1 月 24 日

A General Framework for Treatment Effect Estimation in Semi-Supervised and High Dimensional Settings

Abhishek Chakrabortty,Guorong Dai,Eric Tchetgen Tchetgen

from arxiv, Generalizations added (Appendix A); 59 pages (with supplement), 7 tables, 2 figures

In this article, we aim to provide a general and complete understanding of semi-supervised (SS) causal inference for treatment effects. Specifically, we consider two such estimands: (a) the average treatment effect and (b) the quantile treatment effect, as prototype cases, in an SS setting, characterized by two available data sets: (i) a labeled data set of size $n$, providing observations for a response and a set of high dimensional covariates, as well as a binary treatment indicator; and (ii) an unlabeled data set of size $N$, much larger than $n$, but without the response observed. Using these two data sets, we develop a family of SS estimators which are ensured to be: (1) more robust and (2) more efficient than their supervised counterparts based on the labeled data set only. Beyond the 'standard' double robustness results (in terms of consistency) that can be achieved by supervised methods as well, we further establish root-n consistency and asymptotic normality of our SS estimators whenever the propensity score in the model is correctly specified, without requiring specific forms of the nuisance functions involved. Such an improvement of robustness arises from the use of the massive unlabeled data, so it is generally not attainable in a purely supervised setting. In addition, our estimators are shown to be semi-parametrically efficient as long as all the nuisance functions are correctly specified. Moreover, as an illustration of the nuisance estimators, we consider inverse-probability-weighting type kernel smoothing estimators involving unknown covariate transformation mechanisms, and establish in high dimensional scenarios novel results on their uniform convergence rates, which should be of independent interest. Numerical results on both simulated and real data validate the advantage of our methods over their supervised counterparts with respect to both robustness and efficiency.

全 · 模型選擇 · MoDELS · 最大后驗估計 · 最大后驗 ·

2022 年 1 月 23 日

Probability Distribution on Full Rooted Trees

Yuta Nakahara,Shota Saito,Akira Kamatsuka,Toshiyasu Matsushima

The recursive and hierarchical structure of full rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning. In most of these cases, the full rooted tree is not a random variable; as such, model selection to avoid overfitting becomes problematic. A method to solve this problem is to assume a prior distribution on the full rooted trees. This enables the optimal model selection based on the Bayes decision theory. For example, by assigning a low prior probability to a complex model, the maximum a posteriori estimator prevents the selection of the complex one. Furthermore, we can average all the models weighted by their posteriors. In this paper, we propose a probability distribution on a set of full rooted trees. Its parametric representation is suitable for calculating the properties of our distribution using recursive functions, such as the mode, expectation, and posterior distribution. Although such distributions have been proposed in previous studies, they are only applicable to specific applications. Therefore, we extract their mathematically essential components and derive new generalized methods to calculate the expectation, posterior distribution, etc.

異常點 · 指數衰減 · 優化器 · TIT · CASES ·

2022 年 1 月 23 日

Asymptotics for Outlier Hypothesis Testing

Lin Zhou,Yun Wei,Alfred Hero

from arxiv, Submitted to ISIT 2022; this version is a short version of our IT submission arXiv:2009.03505

We revisit the outlier hypothesis testing framework of Li \emph{et al.} (TIT 2014) and derive fundamental limits for the optimal test. In outlier hypothesis testing, one is given multiple observed sequences, where most sequences are generated i.i.d. from a nominal distribution. The task is to discern the set of outlying sequences that are generated according to anomalous distributions. The nominal and anomalous distributions are \emph{unknown}. We consider the case of multiple outliers where the number of outliers is unknown and each outlier can follow a different anomalous distribution. Under this setting, we study the tradeoff among the probabilities of misclassification error, false alarm and false reject. Specifically, we propose a threshold-based test that ensures exponential decay of misclassification error and false alarm probabilities. We study two constraints on the false reject probability, with one constraint being that it is a non-vanishing constant and the other being that it has an exponential decay rate. For both cases, we characterize bounds on the false reject probability, as a function of the threshold, for each tuple of nominal and anomalous distributions. Finally, we demonstrate the asymptotic optimality of our test under the generalized Neyman-Pearson criterion.

核化 · 隨機變量 · 頻率主義學派 · 后驗推斷 · 再生核希爾伯特空間 ·

2022 年 1 月 21 日

Bayesian Kernel Two-Sample Testing

Qinyi Zhang,Veit Wild,Sarah Filippi,Seth Flaxman,Dino Sejdinovic

In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modelling the difference between kernel mean embeddings in the reproducing kernel Hilbert space utilising the framework established by Flaxman et al (2016). The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e. testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach.

相關系數 · 數據分析 · 相互獨立的 · 泛化理論 · CASES ·

2022 年 1 月 21 日

Adaptive Data Analysis with Correlated Observations

Aryeh Kontorovich,Menachem Sadigurschi,Uri Stemmer

The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.

似然 · 相互獨立的 · 再縮放 · 相關系數 · 規范化的 ·

2022 年 1 月 21 日

Empirical likelihood method for complete independence test on high dimensional data

Yongcheng Qi,Yingchao Zhou

Given a random sample of size $n$ from a $p$ dimensional random vector, where both $n$ and $p$ are large, we are interested in testing whether the $p$ components of the random vector are mutually independent. This is the so-called complete independence test. In the multivariate normal case, it is equivalent to testing whether the correlation matrix is an identity matrix. In this paper, we propose a one-sided empirical likelihood method for the complete independence test for multivariate normal data based on squared sample correlation coefficients. The limiting distribution for our one-sided empirical likelihood test statistic is proved to be $Z^2I(Z>0)$ when both $n$ and $p$ tend to infinity, where $Z$ is a standard normal random variable. In order to improve the power of the empirical likelihood test statistic, we also introduce a rescaled empirical likelihood test statistic. We carry out an extensive simulation study to compare the performance of the rescaled empirical likelihood method and two other statistics which are related to the sum of squared sample correlation coefficients.