亚洲欧洲日产国-久久久免费精品视频

The linear regression model is widely used in the biomedical and social sciences as well as in policy and business research to adjust for covariates and estimate the average effects of treatments. Behind every causal inference endeavor there is at least a notion of a randomized experiment. However, in routine regression analyses in observational studies, it is unclear how well the adjustments made by regression approximate key features of randomization experiments, such as covariate balance, study representativeness, sample boundedness, and unweighted sampling. In this paper, we provide software to empirically address this question. In the new lmw package for R, we compute the implied linear model weights for average treatment effects and provide diagnostics for them. The weights are obtained as part of the design stage of the study; that is, without using outcome information. The implementation is general and applicable, for instance, in settings with instrumental variables and multi-valued treatments; in essence, in any situation where the linear model is the vehicle for adjustment and estimation of average treatment effects with discrete-valued interventions.

相關內容

線性的

關注 1

CASES · 優化器 · Better · 方陣 · 近似 ·

2023 年 5 月 5 日

What Not to Test (for Cyber-Physical Systems)

Xiao Ling,Tim Menzies

from arxiv, 17 pages, 5 figures, 7 tables. Accepted by TSE

For simulation-based systems, finding a set of test cases with the least cost by exploring multiple goals is a complex task. Domain-specific optimization goals (e.g. maximize output variance) are useful for guiding the rapid selection of test cases via mutation. But evaluating the selected test cases via mutation (that can distinguish the current program from the mutated systems) is a different goal to domain-specific optimizations. While the optimization goals can be used to guide the mutation analysis, that guidance should be viewed as a weak indicator since it can hurt the mutation effectiveness goals by focusing too much on the optimization goals. Based on the above, this paper proposes DoLesS (Domination with Least Squares Approximation) that selects the minimal and effective test cases by averaging over a coarse-grained grid of the information gained from multiple optimizations goals. DoLesS applies an inverted least squares approximation approach to find a minimal set of tests that can distinguish better from worse parts of the optimization goals. When tested on multiple simulation-based systems, DoLesS performs as well or even better as the prior state-of-the-art, while running 80-360 times faster on average (seconds instead of hours).

線性回歸 · 線性的 · 稀疏 · 劃分 · 估計/估計量 ·

2023 年 5 月 5 日

Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm

Alexander C. McLain,Anja Zgodic,Howard Bondell

Bayesian variable selection methods are powerful techniques for fitting and inferring on sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. In this paper, we proposed a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates of hyperparameters. Efficient maximum a posteriori (MAP) estimation is completed through a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm. The PX-ECM results in a robust computationally efficient coordinate-wise optimization, which adjusts for the impact of other predictor variables. The completion of the E-step uses an approach motivated by the popular two-groups approach to multiple testing. The result is a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm applied to sparse high-dimensional linear regression, which can be completed using one-at-a-time or all-at-once type optimization. We compare the empirical properties of PROBE to comparable approaches with numerous simulation studies and an analysis of cancer cell lines drug response study. The proposed approach is implemented in the R package probe.

估計/估計量 · 邊緣化 · 試驗 · 可辨認的 · Extensibility ·

2023 年 5 月 4 日

Covariate-assisted bounds on causal effects with instrumental variables

Alexander W. Levis,Matteo Bonvini,Zhenghao Zeng,Luke Keele,Edward H. Kennedy

from arxiv, 40 pages, 2 figures

When an exposure of interest is confounded by unmeasured factors, an instrumental variable (IV) can be used to identify and estimate certain causal contrasts. Identification of the marginal average treatment effect (ATE) from IVs relies on strong untestable structural assumptions. When one is unwilling to assert such structure, IVs can nonetheless be used to construct bounds on the ATE. Famously, Balke and Pearl (1997) proved tight bounds on the ATE for a binary outcome, in a randomized trial with noncompliance and no covariate information. We demonstrate how these bounds remain useful in observational settings with baseline confounders of the IV, as well as randomized trials with measured baseline covariates. The resulting bounds on the ATE are non-smooth functionals, and thus standard nonparametric efficiency theory is not immediately applicable. To remedy this, we propose (1) under a novel margin condition, influence function-based estimators of the bounds that can attain parametric convergence rates when the nuisance functions are modeled flexibly, and (2) estimators of smooth approximations of these bounds. We propose extensions to continuous outcomes, explore finite sample properties in simulations, and illustrate the proposed estimators in a randomized experiment studying the effects of vaccination encouragement on flu-related hospital visits.

估計/估計量 · Machine Learning · Weight · Learning · MoDELS ·

2023 年 5 月 3 日

Efficient estimation of weighted cumulative treatment effects by double/debiased machine learning

Shenbo Xu,Bang Zheng,Bowen Su,Stan Finkelstein,Roy Welsch,Kenney Ng,Ioanna Tzoulaki,Zach Shahn

In empirical studies with time-to-event outcomes, investigators often leverage observational data to conduct causal inference on the effect of exposure when randomized controlled trial data is unavailable. Model misspecification and lack of overlap are common issues in observational studies, and they often lead to inconsistent and inefficient estimators of the average treatment effect. Estimators targeting overlap weighted effects have been proposed to address the challenge of poor overlap, and methods enabling flexible machine learning for nuisance models address model misspecification. However, the approaches that allow machine learning for nuisance models have not been extended to the setting of weighted average treatment effects for time-to-event outcomes when there is poor overlap. In this work, we propose a class of one-step cross-fitted double/debiased machine learning estimators for the weighted cumulative causal effect as a function of restriction time. We prove that the proposed estimators are consistent, asymptotically linear, and reach semiparametric efficiency bounds under regularity conditions. Our simulations show that the proposed estimators using nonparametric machine learning nuisance models perform as well as established methods that require correctly-specified parametric nuisance models, illustrating that our estimators mitigate the need for oracle parametric nuisance models. We apply the proposed methods to real-world observational data from a UK primary care database to compare the effects of anti-diabetic drugs on cancer clinical outcomes.

推斷 · MoDELS · Learning · 估計/估計量 · 圖 ·

2022 年 10 月 15 日

Active Bayesian Causal Inference

Christian Toth,Lars Lorch,Christian Knoll,Andreas Krause,Franz Pernkopf,Robert Peharz,Julius von Kügelgen

from arxiv, NeurIPS 2022 camera-ready version. RP & JvK are shared last authors. 10 pages + Bibliography + Appendix (34 pages total)

Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest.

MoDELS · 優化器 · Analysis · 推斷 · 估計/估計量 ·

2022 年 9 月 19 日

A Survey of Deep Causal Model

Zongyu Li,Zhenfeng Zhu

The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.

估計/估計量 · 圖 · 學成 · 連續優化 · 有向非循環圖 ·

2021 年 11 月 3 日

Multi-task Learning of Order-Consistent Causal Graphs

Xinshi Chen,Haoran Sun,Caleb Ellington,Eric Xing,Le Song

from arxiv, 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.

Processing（編程語言） · 推斷 · NLP · Computational Linguistics · 估計/估計量 ·

2021 年 9 月 2 日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Amir Feder,Katherine A. Keith,Emaad Manzoor,Reid Pryzant,Dhanya Sridhar,Zach Wood-Doughty,Jacob Eisenstein,Justin Grimmer,Roi Reichart,Margaret E. Roberts,Brandon M. Stewart,Victor Veitch,Diyi Yang

A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

推斷 · 估計/估計量 · 統計量 · Machine Learning · 學成 ·

2020 年 2 月 5 日

A Survey on Causal Inference

Liuyi Yao,Zhixuan Chu,Sheng Li,Yaliang Li,Jing Gao,Aidong Zhang

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.