宁毅静平公主小说免费阅读,亚洲国产中文精品在线观看香蕉

Suppose we are interested in the effect of a treatment in a clinical trial. The efficiency of inference may be limited due to small sample size. However, external control data are often available from historical studies. Motivated by an application to Helicobacter pylori infection, we show how to borrow strength from such data to improve efficiency of inference in the clinical trial. Under an exchangeability assumption about the potential outcome mean, we show that the semiparametric efficiency bound for estimating the average treatment effect can be reduced by incorporating both the clinical trial data and external controls. We then derive a doubly robust and locally efficient estimator. The improvement in efficiency is prominent especially when the external control dataset has a large sample size and small variability. Our method allows for a relaxed overlap assumption, and we illustrate with the case where the clinical trial only contains a treated group. We also develop doubly robust and locally efficient approaches that extrapolate the causal effect in the clinical trial to the external population and the overall population. Our results also offer a meaningful implication for trial design and data collection. We evaluate the finite-sample performance of the proposed estimators via simulation. In the Helicobacter pylori infection application, our approach shows that the combination treatment has potential efficacy advantages over the triple therapy.

相關內容

估計/估計量

關注 3

估計/估計量 · 無偏估計 · 對數幾率回歸 · 推斷 · 極大似然 ·

2022 年 2 月 8 日

Inference from Sampling with Response Probabilities Estimated via Calibration

Caren Hasler

A solution to control for nonresponse bias consists of multiplying the design weights of respondents by the inverse of estimated response probabilities to compensate for the nonrespondents. Maximum likelihood and calibration are two approaches that can be applied to obtain estimated response probabilities. The paper develops asymptotic properties of the resulting estimator when calibration is applied. A logistic regression model for the response probabilities is postulated and missing at random data is supposed. The author shows that the estimators with the response probabilities estimated via calibration are asymptotically equivalent to unbiased estimators and that a gain in efficiency is obtained when estimating the response probabilities via calibration as compared to the estimator with the true response probabilities.

邊緣獨立性 · 相互獨立的 · contrastive · 條件獨立的 · HTTPS ·

2022 年 2 月 7 日

A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment

Robert Hu,Dino Sejdinovic,Robin J. Evans

Causal inference grows increasingly complex as the number of confounders increases. Given treatments $X$, confounders $Z$ and outcomes $Y$, we develop a non-parametric method to test the \textit{do-null} hypothesis $H_0:\; p(y|\text{\it do}(X=x))=p(y)$ against the general alternative. Building on the Hilbert Schmidt Independence Criterion (HSIC) for marginal independence testing, we propose backdoor-HSIC (bd-HSIC) and demonstrate that it is calibrated and has power for both binary and continuous treatments under a large number of confounders. Additionally, we establish convergence properties of the estimators of covariance operators used in bd-HSIC. We investigate the advantages and disadvantages of bd-HSIC against parametric tests as well as the importance of using the do-null testing in contrast to marginal independence testing or conditional independence testing. A complete implementation can be found at \hyperlink{//github.com/MrHuff/kgformula}{\texttt{//github.com/MrHuff/kgformula}}.

推斷 · 無偏 · 統計量 · 有偏 · 設計 ·

2022 年 2 月 7 日

Randomization Inference for Cluster-Randomized Test-Negative Designs with Application to Dengue Studies: Unbiased estimation, Partial compliance, and Stepped-wedge design

Bingkai Wang,Suzanne M. Dufault,Dylan S. Small,Nicholas P. Jewell

In 2019, the World Health Organization identified dengue as one of the top ten global health threats. For the control of dengue, the Applying Wolbachia to Eliminate Dengue (AWED) study group conducted a cluster-randomized trial in Yogyakarta, Indonesia, and used a novel design, called the cluster-randomized test-negative design (CR-TND). This design can yield valid statistical inference with data collected by a passive surveillance system and thus has the advantage of cost-efficiency compared to traditional cluster-randomized trials. We investigate the statistical assumptions and properties of CR-TND under a randomization inference framework, which is known to be robust and efficient for small-sample problems. We find that, when the differential healthcare-seeking behavior comparing intervention and control varies across clusters (in contrast to the setting of Dufault and Jewell, 2020 where the differential healthcare-seeking behavior is constant across clusters), current analysis methods for CR-TND can be biased and have inflated type I error. We propose the log-contrast estimator that can eliminate such bias and improve precision by adjusting for covariates. Furthermore, we extend our methods to handle partial intervention compliance and a stepped-wedge design, both of which appear frequently in cluster-randomized trials. Finally, we demonstrate our results by simulation studies and re-analysis of the AWED study.

MoDELS · MS · ONCE · INFORMS · Better ·

2022 年 2 月 7 日

Development, validation and clinical usefulness of a prognostic model for relapse in relapsing-remitting multiple sclerosis

Konstantina Chalkou,Ewout Steyerberg,Patrick Bossuyt,Suvitha Subramanian,Pascal Benkert,Jens Kuhle,Giulio Disanto,Ludwig Kappos,Matthias Egger,Georgia Salanti

Prognosis on the occurrence of relapses in individuals with Relapsing-Remitting Multiple Sclerosis (RRMS), the most common subtype of Multiple Sclerosis (MS), could support individualized decisions and disease management and could be helpful for efficiently selecting patients in future randomized clinical trials. There are only three previously published prognostic models on this, all of them with important methodological shortcomings. We aim to present the development, internal validation, and evaluation of the potential clinical benefit of a prognostic model for relapses for individuals with RRMS using real world data. We followed seven steps to develop and validate the prognostic model. Finally, we evaluated the potential clinical benefit of the developed prognostic model using decision curve analysis. We selected eight baseline prognostic factors: age, sex, prior MS treatment, months since last relapse, disease duration, number of prior relapses, expanded disability status scale (EDSS), and gadolinium enhanced lesions. We also developed a web application where the personalized probabilities to relapse within two years are calculated automatically. The optimism-corrected c-statistic is 0.65 and the optimism-corrected calibration slope was 0.92. The model appears to be clinically useful between the range 15% and 30% of the threshold probability to relapse. The prognostic model we developed offers several advantages in comparison to previously published prognostic models on RRMS. Importantly, we assessed the potential clinical benefit to better quantify the clinical impact of the model. Our web application, once externally validated in the future, could be used by patients and doctors to calculate the individualized probability to relapse within two years and to inform the management of their disease.

精確推斷 · 離散化 · 確切的 · MoDELS · 推斷 ·

2022 年 2 月 6 日

Bayesian Context Trees: Modelling and exact inference for discrete time series

Ioannis Kontoyiannis,Lambros Mertzanis,Athina Panotopoulou,Ioannis Papageorgiou,Maria Skoularidou

from arxiv, 53 pages, 22 figures, small stylistic changes. The associated R package "BCT" is available at CRAN.R-project.org/package=BCT

We develop a new Bayesian modelling framework for the class of higher-order, variable-memory Markov chains, and introduce an associated collection of methodological tools for exact inference with discrete time series. We show that a version of the context tree weighting algorithm can compute the prior predictive likelihood exactly (averaged over both models and parameters), and two related algorithms are introduced, which identify the a posteriori most likely models and compute their exact posterior probabilities. All three algorithms are deterministic and have linear-time complexity. A family of variable-dimension Markov chain Monte Carlo samplers is also provided, facilitating further exploration of the posterior. The performance of the proposed methods in model selection, Markov order estimation and prediction is illustrated through simulation experiments and real-world applications with data from finance, genetics, neuroscience, and animal communication. The associated algorithms are implemented in the R package BCT.

穩健性 · 模型選擇 · 推斷 · MoDELS · 易處理的 ·

2022 年 2 月 6 日

Ultra-high Dimensional Variable Selection for Doubly Robust Causal Inference

Dingke Tang,Dehan Kong,Wenliang Pan,Linbo Wang

from arxiv, To appear in Biometrics

Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra-high dimensional data. In this paper, we propose causal ball screening for confounder selection from modern ultra-high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies suggest excluding causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model-free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency and point-wise normality. Synthetic and real data analysis show that our proposal performs favorably with existing methods in a range of realistic settings. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.

估計/估計量 · 穩健性 · 推斷 · 試驗 · 可辨認的 ·

2022 年 2 月 5 日

Efficient and robust methods for causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a target population

Issa J. Dahabreh,Sarah E. Robertson,Lucia C. Petito,Miguel A. Hernán,Jon A. Steingrimsson

We present methods for causally interpretable meta-analyses that combine information from multiple randomized trials to estimate potential (counterfactual) outcome means and average treatment effects in a target population. We consider identifiability conditions, derive implications of the conditions for the law of the observed data, and obtain identification results for transporting causal inferences from a collection of independent randomized trials to a new target population in which experimental data may not be available. We propose an estimator for the potential (counterfactual) outcome mean in the target population under each treatment studied in the trials. The estimator uses covariate, treatment, and outcome data from the collection of trials, but only covariate data from the target population sample. We show that it is doubly robust, in the sense that it is consistent and asymptotically normal when at least one of the models it relies on is correctly specified. We study the finite sample properties of the estimator in simulation studies and demonstrate its implementation using data from a multi-center randomized trial.

tuning · 散度 · 穩健性 · 估計/估計量 · 貝葉斯推斷 ·

2022 年 2 月 4 日

Adaptation of the Tuning Parameter in General Bayesian Inference with Robust Divergence

Shouto Yonekura,Shonosuke Sugasawa

We introduce a methodology for robust Bayesian estimation with robust divergence (e.g., density power divergence or {\gamma}-divergence), indexed by a single tuning parameter. It is well known that the posterior density induced by robust divergence gives highly robust estimators against outliers if the tuning parameter is appropriately and carefully chosen. In a Bayesian framework, one way to find the optimal tuning parameter would be using evidence (marginal likelihood). However, we numerically illustrate that evidence induced by the density power divergence does not work to select the optimal tuning parameter since robust divergence is not regarded as a statistical model. To overcome the problems, we treat the exponential of robust divergence as an unnormalized statistical model, and we estimate the tuning parameter via minimizing the Hyvarinen score. We also provide adaptive computational methods based on sequential Monte Carlo (SMC) samplers, which enables us to obtain the optimal tuning parameter and samples from posterior distributions simultaneously. The empirical performance of the proposed method through simulations and an application to real data are also provided.

Performer · 聯邦學習 · Extensibility · 學成 · 分解的 ·

2021 年 2 月 21 日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Chengxu Yang,QiPeng Wang,Mengwei Xu,Zhenpeng Chen,Kaigui Bian,Yunxin Liu,Xuanzhe Liu

Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.

優化器 · 度量學習 · Extensibility · 可約的 · 學成 ·

2018 年 3 月 28 日

Active Metric Learning for Supervised Classification

Krishnan Kumaran,Dimitri Papageorgiou,Yutong Chang,Minhan Li,Martin Taká?

Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. We present mixed-integer optimization approaches to find optimal distance metrics that generalize the Mahalanobis metric extensively studied in the literature. Additionally, we generalize and improve upon leading methods by removing reliance on pre-designated "target neighbors," "triplets," and "similarity pairs." Another salient feature of our method is its ability to enable active learning by recommending precise regions to sample after an optimal metric is computed to improve classification performance. This targeted acquisition can significantly reduce computational burden by ensuring training data completeness, representativeness, and economy. We demonstrate classification and computational performance of the algorithms through several simple and intuitive examples, followed by results on real image and medical datasets.