We propose kernel ridge regression estimators for mediation analysis and dynamic treatment effects over short horizons. We allow treatments, covariates, and mediators to be discrete or continuous, and low, high, or infinite dimensional. We propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations. For the continuous treatment case, we prove uniform consistency with finite sample rates. For the discrete treatment case, we prove root-n consistency, Gaussian approximation, and semiparametric efficiency. We conduct simulations then estimate mediated and dynamic treatment effects of the US Job Corps program for disadvantaged youth.
Spurred on by recent successes in causal inference competitions, Bayesian nonparametric (and high-dimensional) methods have recently seen increased attention in the causal inference literature. In this paper, we present a comprehensive overview of Bayesian nonparametric applications to causal inference. Our aims are to (i) introduce the fundamental Bayesian nonparametric toolkit; (ii) discuss how to determine which tool is most appropriate for a given problem; and (iii) show how to avoid common pitfalls in applying Bayesian nonparametric methods in high-dimensional settings. Unlike standard fixed-dimensional parametric problems, where outcome modeling alone can sometimes be effective, we argue that most of the time it is necessary to model both the selection and outcome processes.
Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology, borrowing ideas from statistical physics and computational chemistry, for inferring the posterior distribution of latent diffusion paths and model parameters, given observations of the process. Joint configurations of the underlying process noise and of parameters, mapping onto diffusion paths consistent with observations, form an implicitly defined manifold. Then, by making use of a constrained Hamiltonian Monte Carlo algorithm on the embedded manifold, we are able to perform computationally efficient inference for a class of discretely observed diffusion models. Critically, in contrast with other approaches proposed in the literature, our methodology is highly automated, requiring minimal user intervention and applying alike in a range of settings, including: elliptic or hypo-elliptic systems; observations with or without noise; linear or non-linear observation operators. Exploiting Markovianity, we propose a variant of the method with complexity that scales linearly in the resolution of path discretisation and the number of observation times. Python code reproducing the results is available at //doi.org/10.5281/zenodo.5796148
In this contribution we provide initial findings to the problem of modeling fuzzy rating responses in a traditional probabilistic context. In particular, we study a probabilistic tree model with the aim of representing the stage-wise mechanisms of direct fuzzy rating scales. A Multinomial model coupled with a mixture of Binomial distributions is adopted to model the parameters of LR-type fuzzy responses whereas a binary decision tree is used for the stage-wise rating mechanism. Parameter estimation is performed via marginal maximum likelihood approach whereas the characteristics of the proposed model are evaluated by means of an application to a real dataset.
An important problem in causal inference is to break down the total effect of a treatment on an outcome into different causal pathways and to quantify the causal effect in each pathway. For instance, in causal fairness, the total effect of being a male employee (i.e., treatment) constitutes its direct effect on annual income (i.e., outcome) and the indirect effect via the employee's occupation (i.e., mediator). Causal mediation analysis (CMA) is a formal statistical framework commonly used to reveal such underlying causal mechanisms. One major challenge of CMA in observational studies is handling confounders, variables that cause spurious causal relationships among treatment, mediator, and outcome. Conventional methods assume sequential ignorability that implies all confounders can be measured, which is often unverifiable in practice. This work aims to circumvent the stringent sequential ignorability assumptions and consider hidden confounders. Drawing upon proxy strategies and recent advances in deep learning, we propose to simultaneously uncover the latent variables that characterize hidden confounders and estimate the causal effects. Empirical evaluations using both synthetic and semi-synthetic datasets validate the effectiveness of the proposed method. We further show the potentials of our approach for causal fairness analysis.
Transport engineers employ various interventions to enhance traffic-network performance. Recent emphasises on cycling as a sustainable travel mode aims to reduce traffic congestion. Quantifying the impacts of Cycle Superhighways is complicated due to the non-random assignment of such intervention over the transport network and heavy-tailed distribution of traffic flow. Treatment effects on asymmetric and heavy tailed distributions are better reflected at extreme tails rather than at averages or intermediate quantiles. In such situations, standard methods for estimating quantile treatment effects at the extremes can provide misleading inference due to the high variability of estimates. In this work, we propose a novel method which incorporates a heavy tailed component in the outcome distribution to estimate the extreme tails and simultaneously employs quantile regression to model the bulk part of the distribution utilising a state-of-the-art technique. Simulation results show the superiority of the proposed method over existing estimators for quantile causal effects at extremes in the case of heavy tailed distributions. The analysis of London transport data utilising the proposed method indicates that the traffic flow increased substantially after the Cycle Superhighway came into operation. The findings can assist government agencies in effective decision making to avoid high consequence events and improve network performance.
Estimating the effects of interventions on patient outcome is one of the key aspects of personalized medicine. Their inference is often challenged by the fact that the training data comprises only the outcome for the administered treatment, and not for alternative treatments (the so-called counterfactual outcomes). Several methods were suggested for this scenario based on observational data, i.e.~data where the intervention was not applied randomly, for both continuous and binary outcome variables. However, patient outcome is often recorded in terms of time-to-event data, comprising right-censored event times if an event does not occur within the observation period. Albeit their enormous importance, time-to-event data is rarely used for treatment optimization. We suggest an approach named BITES (Balanced Individual Treatment Effect for Survival data), which combines a treatment-specific semi-parametric Cox loss with a treatment-balanced deep neural network; i.e.~we regularize differences between treated and non-treated patients using Integral Probability Metrics (IPM). We show in simulation studies that this approach outperforms the state of the art. Further, we demonstrate in an application to a cohort of breast cancer patients that hormone treatment can be optimized based on six routine parameters. We successfully validated this finding in an independent cohort. BITES is provided as an easy-to-use python implementation.
Estimation of heterogeneous treatment effects is an active area of research in causal inference. Most of the existing methods, however, focus on estimating the conditional average treatment effects of a single, binary treatment given a set of pre-treatment covariates. In this paper, we propose a method to estimate the heterogeneous causal effects of high-dimensional treatments, which poses unique challenges in terms of estimation and interpretation. The proposed approach is based on a Bayesian mixture of regularized regressions to identify groups of units who exhibit similar patterns of treatment effects. By directly modeling cluster membership with covariates, the proposed methodology allows one to explore the unit characteristics that are associated with different patterns of treatment effects. Our motivating application is conjoint analysis, which is a popular survey experiment in social science and marketing research and is based on a high-dimensional factorial design. We apply the proposed methodology to the conjoint data, where survey respondents are asked to select one of two immigrant profiles with randomly selected attributes. We find that a group of respondents with a relatively high degree of prejudice appears to discriminate against immigrants from non-European countries like Iraq. An open-source software package is available for implementing the proposed methodology.
In this paper we discuss a reduced basis method for linear evolution PDEs, which is based on the application of the Laplace transform. The main advantage of this approach consists in the fact that, differently from time stepping methods, like Runge-Kutta integrators, the Laplace transform allows to compute the solution directly at a given instant, which can be done by approximating the contour integral associated to the inverse Laplace transform by a suitable quadrature formula. In terms of the reduced basis methodology, this determines a significant improvement in the reduction phase - like the one based on the classical proper orthogonal decomposition (POD) - since the number of vectors to which the decomposition applies is drastically reduced as it does not contain all intermediate solutions generated along an integration grid by a time stepping method. We show the effectiveness of the method by some illustrative parabolic PDEs arising from finance and also provide some evidence that the method we propose, when applied to a simple advection equation, does not suffer the problem of slow decay of singular values which instead affects methods based on time integration of the Cauchy problem arising from space discretization.
Behavioral science researchers have shown strong interest in disaggregating within-person relations from between-person differences (stable traits) using longitudinal data. In this paper, we propose a method of within-person variability score-based causal inference for estimating joint effects of time-varying continuous treatments by effectively controlling for stable traits. After explaining the assumed data-generating process and providing formal definitions of stable trait factors, within-person variability scores, and joint effects of time-varying treatments at the within-person level, we introduce the proposed method, which consists of a two-step analysis. Within-person variability scores for each person, which are disaggregated from stable traits of that person, are first calculated using weights based on a best linear correlation preserving predictor through structural equation modeling (SEM). Causal parameters are then estimated via a potential outcome approach, either marginal structural models (MSMs) or structural nested mean models (SNMMs), using calculated within-person variability scores. Unlike the approach that relies entirely on SEM, the present method does not assume linearity for observed time-varying confounders at the within-person level. We emphasize the use of SNMMs with G-estimation because of its property of being doubly robust to model misspecifications in how observed time-varying confounders are functionally related with treatments/predictors and outcomes at the within-person level. Through simulation, we show that the proposed method can recover causal parameters well and that causal estimates might be severely biased if one does not properly account for stable traits. An empirical application using data regarding sleep habits and mental health status from the Tokyo Teen Cohort study is also provided.
This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.