蜜芽亚洲精品国产品国语在线试看,新版天堂在线地址,欧美污久久久久精品一区

We propose a doubly robust approach to characterizing treatment effect heterogeneity in observational studies. We utilize posterior distributions for both the propensity score and outcome regression models to provide valid inference on the conditional average treatment effect even when high-dimensional or nonparametric models are used. We show that our approach leads to conservative inference in finite samples or under model misspecification, and provides a consistent variance estimator when both models are correctly specified. In simulations, we illustrate the utility of these results in difficult settings such as high-dimensional covariate spaces or highly flexible models for the propensity score and outcome regression. Lastly, we analyze environmental exposure data from NHANES to identify how the effects of these exposures vary by subject-level characteristics.

相關內容

估計/估計量

關注 3

推斷 · MoDELS · 注意力機制 · 統計方法 ·

2022 年 1 月 10 日

The How and Why of Bayesian Nonparametric Causal Inference

Antonio R. Linero,Joseph L. Antonelli

Spurred on by recent successes in causal inference competitions, Bayesian nonparametric (and high-dimensional) methods have recently seen increased attention in the causal inference literature. In this paper, we present a comprehensive overview of Bayesian nonparametric applications to causal inference. Our aims are to (i) introduce the fundamental Bayesian nonparametric toolkit; (ii) discuss how to determine which tool is most appropriate for a given problem; and (iii) show how to avoid common pitfalls in applying Bayesian nonparametric methods in high-dimensional settings. Unlike standard fixed-dimensional parametric problems, where outcome modeling alone can sometimes be effective, we argue that most of the time it is necessary to model both the selection and outcome processes.

估計/估計量 · 采樣法 · 樣本 · Performer · 假陽性 ·

2022 年 1 月 10 日

Estimating SARS-CoV-2 Seroprevalence

Samuel Rosin,Bonnie E. Shook-Sa,Stephen R. Cole,Michael G. Hudgens

from arxiv, Main text: 18 pages, 4 figures. Appendix: 16 pages, 10 figures. Preprint

Governments and public health authorities use seroprevalence studies to guide their responses to the COVID-19 pandemic. These seroprevalence surveys estimate the proportion of persons within a given population who have detectable antibodies to SARS-CoV-2. However, serologic assays are prone to misclassification error due to false positives and negatives, and non-probability sampling methods may induce selection bias. In this paper, we consider nonparametric and parametric prevalence estimators that address both challenges by leveraging validation data and assuming equal probabilities of sample inclusion within covariate-defined strata. Both estimators are shown to be consistent and asymptotically normal, and consistent variance estimators are derived. Simulation studies are presented comparing the finite sample performance of the estimators over a range of assay characteristics and sampling scenarios. The methods are used to estimate SARS-CoV-2 seroprevalence in asymptomatic individuals in Belgium and North Carolina.

Weight · 異方差 · 優化器 · PCA · 方差 ·

2022 年 1 月 8 日

Optimally Weighted PCA for High-Dimensional Heteroscedastic Data

David Hong,Fan Yang,Jeffrey A. Fessler,Laura Balzano

from arxiv, 37 pages, 8 figures

Modern data are increasingly both high-dimensional and heteroscedastic. This paper considers the challenge of estimating underlying principal components from high-dimensional data with noise that is heteroscedastic across samples, i.e., some samples are noisier than others. Such heteroscedasticity naturally arises, e.g., when combining data from diverse sources or sensors. A natural way to account for this heteroscedasticity is to give noisier blocks of samples less weight in PCA by using the leading eigenvectors of a weighted sample covariance matrix. We consider the problem of choosing weights to optimally recover the underlying components. In general, one cannot know these optimal weights since they depend on the underlying components we seek to estimate. However, we show that under some natural statistical assumptions the optimal weights converge to a simple function of the signal and noise variances for high-dimensional data. Surprisingly, the optimal weights are not the inverse noise variance weights commonly used in practice. We demonstrate the theoretical results through numerical simulations and comparisons with existing weighting schemes. Finally, we briefly discuss how estimated signal and noise variances can be used when the true variances are unknown, and we illustrate the optimal weights on real data from astronomy.

Facebook AI Research · 估計/估計量 · 判別器 · 數據生成過程 · MoDELS ·

2022 年 1 月 7 日

Impartial Predictive Modeling and the Use of Proxy Variables

Kory D. Johnson,Dean P. Foster,Robert A. Stine

Fairness aware data mining (FADM) aims to prevent algorithms from discriminating against protected groups. The literature has come to an impasse as to what constitutes explainable variability as opposed to discrimination. This distinction hinges on a rigorous understanding of the role of proxy variables; i.e., those variables which are associated both the protected feature and the outcome of interest. We demonstrate that fairness is achieved by ensuring impartiality with respect to sensitive characteristics and provide a framework for impartiality by accounting for different perspectives on the data generating process. In particular, fairness can only be precisely defined in a full-data scenario in which all covariates are observed. We then analyze how these models may be conservatively estimated via regression in partial-data settings. Decomposing the regression estimates provides insights into previously unexplored distinctions between explainable variability and discrimination that illuminate the use of proxy variables in fairness aware data mining.

估計/估計量 · 分解的 · 有偏 · 潛變量/隱變量 · 推斷 ·

2022 年 1 月 6 日

Deep Causal Reasoning for Recommendations

Yaochen Zhu,Jing Yi,Jiayi Xie,Zhenzhong Chen

Traditional recommender systems aim to estimate a user's rating to an item based on observed ratings from the population. As with all observational studies, hidden confounders, which are factors that affect both item exposures and user ratings, lead to a systematic bias in the estimation. Consequently, a new trend in recommender system research is to negate the influence of confounders from a causal perspective. Observing that confounders in recommendations are usually shared among items and are therefore multi-cause confounders, we model the recommendation as a multi-cause multi-outcome (MCMO) inference problem. Specifically, to remedy confounding bias, we estimate user-specific latent variables that render the item exposures independent Bernoulli trials. The generative distribution is parameterized by a DNN with factorized logistic likelihood and the intractable posteriors are estimated by variational inference. Controlling these factors as substitute confounders, under mild assumptions, can eliminate the bias incurred by multi-cause confounders. Furthermore, we show that MCMO modeling may lead to high variance due to scarce observations associated with the high-dimensional causal space. Fortunately, we theoretically demonstrate that introducing user features as pre-treatment variables can substantially improve sample efficiency and alleviate overfitting. Empirical studies on simulated and real-world datasets show that the proposed deep causal recommender shows more robustness to unobserved confounders than state-of-the-art causal recommenders. Codes and datasets are released at //github.com/yaochenzhu/deep-deconf.

MoDELS · 估計/估計量 · CASE · 情景 · INFORMS ·

2022 年 1 月 6 日

Bayesian Regression Approach for Building and Stacking Predictive Models in Time Series Analytics

Bohdan M. Pavlyshenko

The paper describes the use of Bayesian regression for building time series models and stacking different predictive models for time series. Using Bayesian regression for time series modeling with nonlinear trend was analyzed. This approach makes it possible to estimate an uncertainty of time series prediction and calculate value at risk characteristics. A hierarchical model for time series using Bayesian regression has been considered. In this approach, one set of parameters is the same for all data samples, other parameters can be different for different groups of data samples. Such an approach allows using this model in the case of short historical data for specified time series, e.g. in the case of new stores or new products in the sales prediction problem. In the study of predictive models stacking, the models ARIMA, Neural Network, Random Forest, Extra Tree were used for the prediction on the first level of model ensemble. On the second level, time series predictions of these models on the validation set were used for stacking by Bayesian regression. This approach gives distributions for regression coefficients of these models. It makes it possible to estimate the uncertainty contributed by each model to stacking result. The information about these distributions allows us to select an optimal set of stacking models, taking into account the domain knowledge. The probabilistic approach for stacking predictive models allows us to make risk assessment for the predictions that are important in a decision-making process.

條件獨立的 · MoDELS · 相互獨立的 · Extensibility · Processing（編程語言） ·

2022 年 1 月 6 日

A Gaussian copula joint model for longitudinal and time-to-event data with random effects

Zili Zhang,Christiana Charalambous,Peter Foster

from arxiv, 31 pages, 4 figures, do a fast presentation on 2021 Joint Statistical Meeting

Longitudinal and survival sub-models are two building blocks for joint modelling of longitudinal and time to event data. Extensive research indicates separate analysis of these two processes could result in biased outputs due to their associations. Conditional independence between measurements of biomarkers and event time process given latent classes or random effects is a common approach for characterising the association between the two sub-models while taking the heterogeneity among the population into account. However, this assumption is tricky to validate because of the unobservable latent variables. Thus a Gaussian copula joint model with random effects is proposed to accommodate the scenarios where the conditional independence assumption is questionable. In our proposed model, the conventional joint model assuming conditional independence is a special case when the association parameter in the Gaussian copula shrinks to zero. Simulation studies and real data application are carried out to evaluate the performance of our proposed model. In addition, personalised dynamic predictions of survival probabilities are obtained based on the proposed model and comparisons are made to the predictions obtained under the conventional joint model.

估計/估計量 · 可辨認的 · GROUP · binary · 正則化項 ·

2022 年 1 月 4 日

Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis

Max Goplerud,Kosuke Imai,Nicole E. Pashley

from arxiv, 31 pages (main text); 20 pages (supplementary information)

Estimation of heterogeneous treatment effects is an active area of research in causal inference. Most of the existing methods, however, focus on estimating the conditional average treatment effects of a single, binary treatment given a set of pre-treatment covariates. In this paper, we propose a method to estimate the heterogeneous causal effects of high-dimensional treatments, which poses unique challenges in terms of estimation and interpretation. The proposed approach is based on a Bayesian mixture of regularized regressions to identify groups of units who exhibit similar patterns of treatment effects. By directly modeling cluster membership with covariates, the proposed methodology allows one to explore the unit characteristics that are associated with different patterns of treatment effects. Our motivating application is conjoint analysis, which is a popular survey experiment in social science and marketing research and is based on a high-dimensional factorial design. We apply the proposed methodology to the conjoint data, where survey respondents are asked to select one of two immigrant profiles with randomly selected attributes. We find that a group of respondents with a relatively high degree of prejudice appears to discriminate against immigrants from non-European countries like Iraq. An open-source software package is available for implementing the proposed methodology.

估計/估計量 · 搜索引擎營銷 · MoDELS · 推斷 · 線性的 ·

2022 年 1 月 4 日

Within-Person Variability Score-Based Causal Inference: A Two-Step Estimation for Joint Effects of Time-Varying Treatments

Satoshi Usami

from arxiv, Supplemental materials are available upon request. Or, access //usami-lab.com/Arxiv_20220104_ALL.pdf

Behavioral science researchers have shown strong interest in disaggregating within-person relations from between-person differences (stable traits) using longitudinal data. In this paper, we propose a method of within-person variability score-based causal inference for estimating joint effects of time-varying continuous treatments by effectively controlling for stable traits. After explaining the assumed data-generating process and providing formal definitions of stable trait factors, within-person variability scores, and joint effects of time-varying treatments at the within-person level, we introduce the proposed method, which consists of a two-step analysis. Within-person variability scores for each person, which are disaggregated from stable traits of that person, are first calculated using weights based on a best linear correlation preserving predictor through structural equation modeling (SEM). Causal parameters are then estimated via a potential outcome approach, either marginal structural models (MSMs) or structural nested mean models (SNMMs), using calculated within-person variability scores. Unlike the approach that relies entirely on SEM, the present method does not assume linearity for observed time-varying confounders at the within-person level. We emphasize the use of SNMMs with G-estimation because of its property of being doubly robust to model misspecifications in how observed time-varying confounders are functionally related with treatments/predictors and outcomes at the within-person level. Through simulation, we show that the proposed method can recover causal parameters well and that causal estimates might be severely biased if one does not properly account for stable traits. An empirical application using data regarding sleep habits and mental health status from the Tokyo Teen Cohort study is also provided.

似然 · 估計/估計量 · 最大似然估計 · 極大似然 · MoDELS ·

2018 年 9 月 24 日

Implicit Maximum Likelihood Estimation

Ke Li,Jitendra Malik

from arxiv, 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at //people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.