国产综合欧美日韩激情在线,激情欧美综合,A级日本乱理伦片免费入口

It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required. The comparison is based on four data sets and two simulations. There is nothing special about these data sets which are often used as examples in the literature. All the model based procedures failed. In the second part of the paper it is argued that the cause of this failure is the very use of a model. If the model involves all the available covariates standard P-values can be used. The use of P-values in this situation is quite straightforward. As soon as the model specifies only some unknown subset of the covariates the problem being to identify this subset the situation changes radically. There are many P-values, they are dependent and most of them are invalid. The Bayesian paradigm also assumes a correct model but although there are no conceptual problems with a large number of covariates there is a considerable overhead causing computational and allocation problems even for moderately sized data sets. The Gaussian covariate procedure is based on P-values which are defined as the probability that a random Gaussian covariate is better than the covariate being considered. These P-values are exact and valid whatever the situation. The allocation requirements and the algorithmic complexity are both linear in the size of the data making the procedure capable of handling large data sets. It outperforms all the other procedures in every respect.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 方陣 · 穩健性 · contrastive · REST ·

2022 年 2 月 21 日

Least sum of squares of trimmed residuals regression

Yijun Zuo

from arxiv, 49 pages, 3 figures, and 11 tables

In the famous least sum of trimmed squares (LTS) of residuals estimator (Rousseeuw (1984)), residuals are first squared and then trimmed. In this article, we first trim residuals - using a depth trimming scheme - and then square the rest of residuals. The estimator that can minimize the sum of squares of the trimmed residuals, is called an LST estimator. It turns out that LST is also a robust alternative to the classic least sum of squares (LS) of residuals estimator. Indeed, it has a very high finite sample breakdown point and can resist, asymptotically, up to 50% contamination without breakdown - in sharp contrast to the 0% of the LS estimator. The population version of LST is Fisher consistent, and the sample version is strong and root-n consistent under some conditions. Three approximate algorithms for computing LST are proposed and tested in synthetic and real data examples. These experiments indicate that two of the algorithms can compute the LST estimator very fast and with relatively smaller variances, compared with that of the famous LTS estimator. All the evidence suggests that LST deserves to be a robust alternative to the LS estimator and is feasible in practice for large data sets (with possible contamination and outliers) in high dimensions.

Weight · MoDELS · INFORMS · 散度 · Extensibility ·

2022 年 2 月 21 日

Generalized Geographically Weighted Regression Model within a Modularized Bayesian Framework

Yang Liu,Robert J. B. Goudie

from arxiv, 34 pages, 11 figures

Geographically weighted regression (GWR) models handle geographical dependence through a spatially varying coefficient model and have been widely used in applied science, but its general Bayesian extension is unclear because it involves a weighted log-likelihood which does not imply a probability distribution on data. We present a Bayesian GWR model and show that its essence is dealing with partial misspecification of the model. Current modularized Bayesian inference models accommodate partial misspecification from a single component of the model. We extend these models to handle partial misspecification in more than one component of the model, as required for our Bayesian GWR model. Information from the various spatial locations is manipulated via a geographically weighted kernel and the optimal manipulation is chosen according to a Kullback-Leibler (KL) divergence. We justify the model via an information risk minimization approach and show the consistency of the proposed estimator in terms of a geographically weighted KL divergence.

置信度 · Performer · Better · 統計量 · 基 ·

2022 年 2 月 21 日

Confidence and discoveries with e-values

Vladimir Vovk,Ruodu Wang

from arxiv, 46 pages, 13 figures, and 4 tables

We discuss systematically two versions of confidence regions: those based on p-values and those based on e-values, a recent alternative to p-values. Both versions can be applied to multiple hypothesis testing, and in this paper we are interested in procedures that control the number of false discoveries under arbitrary dependence between the base p- or e-values. We introduce a procedure that is based on e-values and show that it is efficient both computationally and statistically using simulated and real-world datasets. Comparison with the corresponding standard procedure based on p-values is not straightforward, but there are indications that the new one performs significantly better in some situations.

對數幾率回歸 · 統計量 · 估計/估計量 · 推斷 · 置信度 ·

2022 年 2 月 21 日

Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression

Rong Ma,Zijian Guo,T. Tony Cai,Hongzhe Li

This paper studies the problem of statistical inference for genetic relatedness between binary traits based on individual-level genome-wide association data. Specifically, under the high-dimensional logistic regression model, we define parameters characterizing the cross-trait genetic correlation, the genetic covariance and the trait-specific genetic variance. A novel weighted debiasing method is developed for the logistic Lasso estimator and computationally efficient debiased estimators are proposed. The rates of convergence for these estimators are studied and their asymptotic normality is established under mild conditions. Moreover, we construct confidence intervals and statistical tests for these parameters, and provide theoretical justifications for the methods, including the coverage probability and expected length of the confidence intervals, as well as the size and power of the proposed tests. Numerical studies are conducted under both model generated data and simulated genetic data to show the superiority of the proposed methods and their applicability to the analysis of real genetic data. Finally, by analyzing a real data set on autoimmune diseases, we demonstrate the ability to obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases.

估計/估計量 · 優化器 · 可約的 · INFORMS · Performer ·

2022 年 2 月 18 日

Using Pilot Data to Size Observational Studies for the Estimation of Dynamic Treatment Regimes

Eric J. Rose,Erica E. M. Moodie,Susan Shortreed

There has been significant attention given to developing data-driven methods for tailoring patient care based on individual patient characteristics. Dynamic treatment regimes formalize this through a sequence of decision rules that map patient information to a suggested treatment. The data for estimating and evaluating treatment regimes are ideally gathered through the use of Sequential Multiple Assignment Randomized Trials (SMARTs) though longitudinal observational studies are commonly used due to the potentially prohibitive costs of conducting a SMART. These studies are typically sized for simple comparisons of fixed treatment sequences or, in the case of observational studies, a priori sample size calculations are often not performed. We develop sample size procedures for the estimation of dynamic treatment regimes from observational studies. Our approach uses pilot data to ensure a study will have sufficient power for comparing the value of the optimal regime, i.e. the expected outcome if all patients in the population were treated by following the optimal regime, with a known comparison mean. Our approach also ensures the value of the estimated optimal treatment regime is within an a priori set range of the value of the true optimal regime with a high probability. We examine the performance of the proposed procedure with a simulation study and use it to size a study for reducing depressive symptoms using data from electronic health records.

估計/估計量 · 蒙特卡羅 · Performer · 有偏 · 可約的 ·

2022 年 2 月 18 日

Monte Carlo Sensitivity Analysis for Unmeasured Confounding in Dynamic Treatment Regimes

Eric J. Rose,Erica E. M. Moodie,Susan Shortreed

Data-driven methods for personalizing treatment assignment have garnered much attention from clinicians and researchers. Dynamic treatment regimes formalize this through a sequence of decision rules that map individual patient characteristics to a recommended treatment. Observational studies are commonly used for estimating dynamic treatment regimes due to the potentially prohibitive costs of conducting sequential multiple assignment randomized trials. However, estimating a dynamic treatment regime from observational data can lead to bias in the estimated regime due to unmeasured confounding. Sensitivity analyses are useful for assessing how robust the conclusions of the study are to a potential unmeasured confounder. A Monte Carlo sensitivity analysis is a probabilistic approach that involves positing and sampling from distributions for the parameters governing the bias. We propose a method for performing a Monte Carlo sensitivity analysis of the bias due to unmeasured confounding in the estimation of dynamic treatment regimes. We demonstrate the performance of the proposed procedure with a simulation study and apply it to an observational study examining tailoring the use of antidepressants for reducing symptoms of depression using data from Kaiser Permanente Washington (KPWA).

得分 · 秩 · 情景 · 博弈論 ·

2022 年 2 月 18 日

A comparative study of scoring systems by simulations

László Csató

from arxiv, 16 pages, 4 figures, 6 tables

Scoring rules aggregate individual rankings by assigning some points to each position in each ranking such that the total sum of points provides the overall ranking of the alternatives. They are widely used in sports competitions consisting of multiple contests. We study the tradeoff between two risks in this setting: (1) the threat of early clinch when the title has been clinched before the last contest(s) of the competition take place; (2) the danger of winning the competition without finishing first in any contest. In particular, four historical points scoring systems of the Formula One World Championship are compared with the family of geometric scoring rules, recently proposed by an axiomatic approach. The schemes used in practice are found to be competitive with respect to these goals, and the current rule seems to be a reasonable compromise close to the Pareto frontier. Our results shed more light on the evolution of the Formula One points scoring systems and contribute to the issue of choosing the set of point values.

MoDELS · 規范化的 · GANs · state-of-the-art · contrastive ·

2021 年 3 月 8 日

Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models

Sam Bond-Taylor,Adam Leach,Yang Long,Chris G. Willcocks

from arxiv, 21 pages, 10 figures

Deep generative modelling is a class of techniques that train deep neural networks to model the distribution of training samples. Research has fragmented into various interconnected approaches, each of which making trade-offs including run-time, diversity, and architectural restrictions. In particular, this compendium covers energy-based models, variational autoencoders, generative adversarial networks, autoregressive models, normalizing flows, in addition to numerous hybrid approaches. These techniques are drawn under a single cohesive framework, comparing and contrasting to explain the premises behind each, while reviewing current state-of-the-art advances and implementations.

簇 · GROUP · 聚類方法 · Performer · contrastive ·

2019 年 2 月 28 日

Efficient Parameter-free Clustering Using First Neighbor Relations

M. Saquib Sarfraz,Vivek Sharma,Rainer Stiefelhagen

from arxiv, CVPR 2019

We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.

未標記 · 主動學習 · INFORMS · 學成 · 樣本 ·

2016 年 2 月 24 日

Active Learning from Positive and Unlabeled Data

Alireza Ghasemi,Hamid R. Rabiee,Mohsen Fadaee,Mohammad T. Manzuri,Mohammad H. Rohban

from arxiv, 6 pages, presented at IEEE ICDM 2011 Workshops

During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label of that point from user. Many different methods such as uncertainty sampling and minimum risk sampling have been utilized to select the most informative sample in active learning. Although many active learning algorithms have been proposed so far, most of them work with binary or multi-class classification problems and therefore can not be applied to problems in which only samples from one class as well as a set of unlabeled data are available. Such problems arise in many real-world situations and are known as the problem of learning from positive and unlabeled data. In this paper we propose an active learning algorithm that can work when only samples of one class as well as a set of unlabelled data are available. Our method works by separately estimating probability desnity of positive and unlabeled points and then computing expected value of informativeness to get rid of a hyper-parameter and have a better measure of informativeness./ Experiments and empirical analysis show promising results compared to other similar methods.