We consider the problem of identifying intermediate variables (or mediators) that regulate the effect of a treatment on a response variable. While there has been significant research on this classical topic, little work has been done when the set of potential mediators is high-dimensional (HD). A further complication arises when these mediators are interrelated (with unknown dependencies). In particular, we assume that the causal structure of the treatment, the confounders, the potential mediators and the response is a (possibly unknown) directed acyclic graph (DAG). HD DAG models have previously been used for the estimation of causal effects from observational data. In particular, methods called IDA and joint-IDA have been developed for estimating the effects of single and multiple simultaneous interventions, respectively. In this paper, we propose an IDA-type method called MIDA for estimating so-called individual mediation effects from HD observational data. Although IDA and joint-IDA estimators have been shown to be consistent in certain sparse HD settings, their asymptotic properties such as convergence in distribution and inferential tools in such settings have remained unknown. In this paper, we prove HD consistency of MIDA for linear structural equation models with sub-Gaussian errors. More importantly, we derive distributional convergence results for MIDA in similar HD settings, which are applicable to IDA and joint-IDA estimators as well. To our knowledge, these are the first such distributional convergence results facilitating inference for IDA-type estimators. These are built on our novel theoretical results regarding uniform bounds for linear regression estimators over varying subsets of HD covariates which may be of independent interest. Finally, we empirically validate our asymptotic theory for MIDA and demonstrate its usefulness via simulations and a real data application.
Clinical studies are often encountered with truncation-by-death issues, which render the outcomes undefined. Statistical analysis based only on observed survivors may lead to biased results because the characters of survivors may differ greatly between treatment groups. Under the principal stratification framework, a meaningful causal parameter, the survivor average causal effect, in the always-survivor group can be defined. This causal parameter may not be identifiable in observational studies where the treatment assignment and the survival or outcome process are confounded by unmeasured features. In this paper, we propose a new method to deal with unmeasured confounding when the outcome is truncated by death. First, a new method is proposed to identify the heterogeneous conditional survival average causal effect based on a substitutional variable under monotonicity. Second, under additional assumptions, the survivor average causal effect on the whole population is identified. Furthermore, we consider estimation and inference for the conditional survivor average causal effect based on parametric and nonparametric methods. The proposed method can be used for post marketing drug safety or efficiency by utilizing real world data.
This article describes an R package bqror that estimates Bayesian quantile regression for ordinal models introduced in \citet{Rahman-2016}. The paper classifies ordinal models into two types and offers two computationally efficient, yet simple, MCMC algorithms for estimating ordinal quantile regression. The generic ordinal model with more than 3 outcomes (labeled $OR_{I}$ model) is estimated by a combination of Gibbs sampling and Metropolis-Hastings algorithm. Whereas an ordinal model with exactly 3 outcomes (labeled $OR_{II}$ model) is estimated using Gibbs sampling only. In line with the Bayesian literature, we suggest using marginal likelihood for comparing alternative quantile regression models and explain how to calculate the same. The models and their estimation procedures are illustrated via multiple simulation studies and implemented in the two applications presented in \citet{Rahman-2016}. The article also describes several other functions contained within the bqror package, which are necessary for estimation, inference, and assessing model fit.
Sibling fixed effects (FE) models are useful for estimating causal treatment effects while offsetting unobserved sibling-invariant confounding. However, treatment estimates are biased if an individual's outcome affects their sibling's outcome. We propose a robustness test for assessing the presence of outcome-to-outcome interference in linear two-sibling FE models. We regress a gain-score--the difference between siblings' continuous outcomes--on both siblings' treatments and on a pre-treatment observed FE. Under certain restrictions, the observed FE's partial regression coefficient signals the presence of outcome-to-outcome interference. Monte Carlo simulations demonstrated the robustness test under several models. We found that an observed FE signaled outcome-to-outcome spillover if it was directly associated with an sibling-invariant confounder of treatments and outcomes, directly associated with a sibling's treatment, or directly and equally associated with both siblings' outcomes. However, the robustness test collapsed if the observed FE was directly but differentially associated with siblings' outcomes or if outcomes affected siblings' treatments.
Many frameworks exist to infer cause and effect relations in complex nonlinear systems but a complete theory is lacking. A new framework is presented that is fully nonlinear, provides a complete information theoretic disentanglement of causal processes, allows for nonlinear interactions between causes, identifies the causal strength of missing or unknown processes, and can analyze systems that cannot be represented on Directed Acyclic Graphs. The basic building blocks are information theoretic measures such as (conditional) mutual information and a new concept called certainty that monotonically increases with the information available about the target process. The framework is presented in detail and compared with other existing frameworks, and the treatment of confounders is discussed. While there are systems with structures that the framework cannot disentangle, it is argued that any causal framework that is based on integrated quantities will miss out potentially important information of the underlying probability density functions. The framework is tested on several highly simplified stochastic processes to demonstrate how blocking and gateways are handled, and on the chaotic Lorentz 1963 system. We show that the framework provides information on the local dynamics, but also reveals information on the larger scale structure of the underlying attractor. Furthermore, by applying it to real observations related to the El-Nino-Southern-Oscillation system we demonstrate its power and advantage over other methodologies.
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d_n$ and $n$ both increase to infinity together at some prescribed relative rate. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d_n/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference -- developing methods whose validity does not depend on any assumption on $d_n$. We introduce a new, generic approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonals. We exemplify our technique for a handful of classical problems including one-sample mean and covariance testing. Our tests are shown to have minimax rate-optimal power against appropriate local alternatives, and without explicitly targeting the high-dimensional setting their power is optimal up to a $\sqrt 2$ factor. A hidden advantage is that our proofs are simple and transparent. We end by describing several fruitful open directions.
In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are required to model the underlying social mechanism. We propose an approach that requires no such assumptions, allowing for interference that is both unmodeled and strong, with confidence intervals found using only the randomization of treatment. Additionally, the approach allows for the usage of regression, matching, or weighting, as may best fit the application at hand. Inference is done by bounding the distribution of the estimation error over all possible values of the unknown counterfactual, using an integer program. Examples are shown using a vaccine trial and two experiments investigating social influence.
A common concern when trying to draw causal inferences from observational data is that the measured covariates are insufficiently rich to account for all sources of confounding. In practice, many of the covariates may only be proxies of the latent confounding mechanism. Recent work has shown that in certain settings where the standard 'no unmeasured confounding' assumption fails, proxy variables can be leveraged to identify causal effects. Results currently exist for the total causal effect of an intervention, but little consideration has been given to learning about the direct or indirect pathways of the effect through a mediator variable. In this work, we describe three separate proximal identification results for natural direct and indirect effects in the presence of unmeasured confounding. We then develop a semiparametric framework for inference on natural (in)direct effects, which leads us to locally efficient, multiply robust estimators.
In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of fields such as finance, genome associations analysis and spatial modeling. We adopt a flexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a fixed bandwidth cannot be adapted for this general setup. We employ the modified Cholesky decomposition for the precision matrix and design a flexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to efficient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets.
In evaluating social programs, it is important to measure treatment effects within a market economy, where interference arises due to individuals buying and selling various goods at the prevailing market price. We introduce a stochastic model of potential outcomes in market equilibrium, where the market price is an exposure mapping. We prove that average direct and indirect treatment effects converge to interpretable mean-field treatment effects, and provide estimators for these effects through a unit-level randomized experiment augmented with randomization in prices. We also provide a central limit theorem for the estimators that depends on the sensitivity of outcomes to prices. For a variant where treatments are continuous, we show that the sum of direct and indirect effects converges to the total effect of a marginal policy change. We illustrate the coverage and consistency properties of the estimators in simulations of different interventions in a two-sided market.
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.