Causal inference from observational data can be viewed as a missing data problem arising from a hypothetical population-scale randomized trial matched to the observational study. This links a target trial protocol with a corresponding generative predictive model for inference, providing a complete framework for transparent communication of causal assumptions and statistical uncertainty on treatment effects, without the need for counterfactuals. The intuitive foundation for the work is that a whole population randomized trial would provide answers to any observable causal question with certainty. Thus, our fundamental problem of causal inference is the missingness of the hypothetical target trial data, which we solve through repeated imputation from a generative predictive model conditioned on the observational data. Causal assumptions map to intuitive conditions on the transportability of predictive models across populations and conditions. We demonstrate our approach on a real data application to studying the effects of maternal smoking on birthweights using extensions of Bayesian additive regression trees and inverse probability weighting.
There is growing interest in developing causal inference methods for multi-valued treatments with a focus on pairwise average treatment effects. Here we focus on a clinically important, yet less-studied estimand: causal drug-drug interactions (DDIs), which quantifies the degree to which the causal effect of drug A is altered by the presence versus the absence of drug B. Confounding adjustment when studying the effects of DDIs can be accomplished via inverse probability of treatment weighting (IPTW), a standard approach originally developed for binary treatments and later generalized to multi-valued treatments. However, this approach generally results in biased results when the propensity score model is misspecified. Motivated by the need for more robust techniques, we propose two empirical likelihood-based weighting approaches that allow for specifying a set of propensity score models, with the second method balancing user-specified covariates directly, by incorporating additional, nonparametric constraints. The resulting estimators from both methods are consistent when the postulated set of propensity score models contains a correct one; this property has been termed multiple robustness. We then evaluate their finite sample performance through simulation. The results demonstrate that the proposed estimators outperform the standard IPTW method in terms of both robustness and efficiency. Finally, we apply the proposed methods to evaluate the impact of renin-angiotensin system inhibitors (RAS-I) on the comparative nephrotoxicity of nonsteroidal anti-inflammatory drugs (NSAID) and opioids, using data derived from electronic medical records from a large multi-hospital health system.
Large-scale genome-wide association studies (GWAS) have offered an exciting opportunity to discover putative causal genes or risk factors associated with diseases by using SNPs as instrumental variables (IVs). However, conventional approaches assume linear causal relations partly for simplicity and partly for the only availability of GWAS summary data. In this work, we propose a novel model {for transcriptome-wide association studies (TWAS)} to incorporate nonlinear relationships across IVs, an exposure, and an outcome, which is robust against violations of the valid IV assumptions and permits the use of GWAS summary data. We decouple the estimation of a marginal causal effect and a nonlinear transformation, where the former is estimated via sliced inverse regression and a sparse instrumental variable regression, and the latter is estimated by a ratio-adjusted inverse regression. On this ground, we propose an inferential procedure. An application of the proposed method to the ADNI gene expression data and the IGAP GWAS summary data identifies 18 causal genes associated with Alzheimer's disease, including APOE and TOMM40, in addition to 7 other genes missed by two-stage least squares considering only linear relationships. Our findings suggest that nonlinear modeling is required to unleash the power of IV regression for identifying potentially nonlinear gene-trait associations. Accompanying this paper is our Python library nl-causal(//github.com/nl-causal/nonlinear-causal) that implements the proposed method.
Investigators are increasingly using novel methods for extending (generalizing or transporting) causal inferences from a trial to a target population. In many generalizability and transportability analyses, the trial and the observational data from the target population are separately sampled, following a non-nested trial design. In practical implementations of this design, non-randomized individuals from the target population are often identified by conditioning on the use of a particular treatment, while individuals who used other candidate treatments for the same indication or individuals who did not use any treatment are excluded. In this paper, we argue that conditioning on treatment in the target population changes the estimand of generalizability and transportability analyses and potentially introduces serious bias in the estimation of causal estimands in the target population or the subset of the target population using a specific treatment. Furthermore, we argue that the naive application of marginalization-based or weighting-based standardization methods does not produce estimates of any reasonable causal estimand. We use causal graphs and counterfactual arguments to characterize the identification problems induced by conditioning on treatment in the target population and illustrate the problems using simulated data. We conclude by considering the implications of our findings for applied work.
This work considers Gaussian process interpolation with a periodized version of the Mat{\'e}rn covariance function (Stein, 1999, Section 6.7) with Fourier coefficients $\phi$($\alpha$^2 + j^2)^(--$\nu$--1/2). Convergence rates are studied for the joint maximum likelihood estimation of $\nu$ and $\phi$ when the data is sampled according to the model. The mean integrated squared error is also analyzed with fixed and estimated parameters, showing that maximum likelihood estimation yields asymptotically the same error as if the ground truth was known. Finally, the case where the observed function is a ''deterministic'' element of a continuous Sobolev space is also considered, suggesting that bounding assumptions on some parameters can lead to different estimates.
Bayesian Neural Networks with Latent Variables (BNN+LVs) capture predictive uncertainty by explicitly modeling model uncertainty (via priors on network weights) and environmental stochasticity (via a latent input noise variable). In this work, we first show that BNN+LV suffers from a serious form of non-identifiability: explanatory power can be transferred between the model parameters and latent variables while fitting the data equally well. We demonstrate that as a result, in the limit of infinite data, the posterior mode over the network weights and latent variables is asymptotically biased away from the ground-truth. Due to this asymptotic bias, traditional inference methods may in practice yield parameters that generalize poorly and misestimate uncertainty. Next, we develop a novel inference procedure that explicitly mitigates the effects of likelihood non-identifiability during training and yields high-quality predictions as well as uncertainty estimates. We demonstrate that our inference method improves upon benchmark methods across a range of synthetic and real data-sets.
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
We consider a potential outcomes model in which interference may be present between any two units but the extent of interference diminishes with spatial distance. The causal estimand is the global average treatment effect, which compares outcomes under the counterfactuals that all or no units are treated. We study a class of designs in which space is partitioned into clusters that are randomized into treatment and control. For each design, we estimate the treatment effect using a Horvitz-Thompson estimator that compares the average outcomes of units with all or no neighbors treated, where the neighborhood radius is of the same order as the cluster size dictated by the design. We derive the estimator's rate of convergence as a function of the design and degree of interference and use this to obtain estimator-design pairs that achieve near-optimal rates of convergence under relatively minimal assumptions on interference. We prove that the estimators are asymptotically normal and provide a variance estimator. For practical implementation of the designs, we suggest partitioning space using clustering algorithms.
Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.