Often in public health, we are interested in the treatment effect of an intervention on a population that is systemically different from the experimental population the intervention was originally evaluated in. When treatment effect heterogeneity is present in a randomized controlled trial, generalizing the treatment effect from this experimental population to a target population of interest is a complex problem; it requires the characterization of both the treatment effect heterogeneity and the baseline covariate mismatch between the two populations. Despite the importance of this problem, the literature for variable selection in this context is limited. In this paper, we present a Group LASSO-based approach to variable selection in the context of treatment effect generalization, with an application to generalize the treatment effect of very low nicotine content cigarettes to the overall U.S. smoking population.
Major depressive disorder (MDD) is a heterogeneous condition; multiple underlying neurobiological substrates could be associated with treatment response variability. Understanding the sources of this variability and predicting outcomes has been elusive. Machine learning has shown promise in predicting treatment response in MDD, but one limitation has been the lack of clinical interpretability of machine learning models. We analyzed data from six clinical trials of pharmacological treatment for depression (total n = 5438) using the Differential Prototypes Neural Network (DPNN), a neural network model that derives patient prototypes which can be used to derive treatment-relevant patient clusters while learning to generate probabilities for differential treatment response. A model classifying remission and outputting individual remission probabilities for five first-line monotherapies and three combination treatments was trained using clinical and demographic data. Model validity and clinical utility were measured based on area under the curve (AUC) and expected improvement in sample remission rate with model-guided treatment, respectively. Post-hoc analyses yielded clusters (subgroups) based on patient prototypes learned during training. Prototypes were evaluated for interpretability by assessing differences in feature distributions and treatment-specific outcomes. A 3-prototype model achieved an AUC of 0.66 and an expected absolute improvement in population remission rate compared to the sample remission rate. We identified three treatment-relevant patient clusters which were clinically interpretable. It is possible to produce novel treatment-relevant patient profiles using machine learning models; doing so may improve precision medicine for depression. Note: This model is not currently the subject of any active clinical trials and is not intended for clinical use.
Collective efficacy -- the capacity of communities to exert social control toward the realization of their shared goals -- is a foundational concept in the urban sociology and neighborhood effects literature. Traditionally, empirical studies of collective efficacy use large sample surveys to estimate collective efficacy of different neighborhoods within an urban setting. Such studies have demonstrated an association between collective efficacy and local variation in community violence, educational achievement, and health. Unlike traditional collective efficacy measurement strategies, the Adolescent Health and Development in Context (AHDC) Study implemented a new approach, obtaining spatially-referenced, place-based ratings of collective efficacy from a representative sample of individuals residing in Columbus, OH. In this paper, we introduce a novel nonstationary spatial model for interpolation of the AHDC collective efficacy ratings across the study area which leverages administrative data on land use. Our constructive model specification strategy involves dimension expansion of a latent spatial process and the use of a filter defined by the land-use partition of the study region to connect the latent multivariate spatial process to the observed ordinal ratings of collective efficacy. Careful consideration is given to the issues of parameter identifiability, computational efficiency of an MCMC algorithm for model fitting, and fine-scale spatial prediction of collective efficacy.
Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.
Individualized treatment decisions can improve health outcomes, but using data to make these decisions in a reliable, precise, and generalizable way is challenging with a single dataset. Leveraging multiple randomized controlled trials allows for the combination of datasets with unconfounded treatment assignment to improve the power to estimate heterogeneous treatment effects. This paper discusses several non-parametric approaches for estimating heterogeneous treatment effects using data from multiple trials. We extend single-study methods to a scenario with multiple trials and explore their performance through a simulation study, with data generation scenarios that have differing levels of cross-trial heterogeneity. The simulations demonstrate that methods that directly allow for heterogeneity of the treatment effect across trials perform better than methods that do not, and that the choice of single-study method matters based on the functional form of the treatment effect. Finally, we discuss which methods perform well in each setting and then apply them to four randomized controlled trials to examine effect heterogeneity of treatments for major depressive disorder.
Here, we investigate whether (and how) experimental design could aid in the estimation of the precision matrix in a Gaussian chain graph model, especially the interplay between the design, the effect of the experiment and prior knowledge about the effect. Estimation of the precision matrix is a fundamental task to infer biological graphical structures like microbial networks. We compare the marginal posterior precision of the precision matrix under four priors: flat, conjugate Normal-Wishart, Normal-MGIG and a general independent. Under the flat and conjugate priors, the Laplace-approximated posterior precision is not a function of the design matrix rendering useless any efforts to find an optimal experimental design to infer the precision matrix. In contrast, the Normal-MGIG and general independent priors do allow for the search of optimal experimental designs, yet there is a sharp upper bound on the information that can be extracted from a given experiment. We confirm our theoretical findings via a simulation study comparing i) the KL divergence between prior and posterior and ii) the Stein's loss difference of MAPs between random and no experiment. Our findings provide practical advice for domain scientists conducting experiments to better infer the precision matrix as a representation of a biological network.
In this paper we study the properties of the Lasso estimator of the drift component in the diffusion setting. More specifically, we consider a multivariate parametric diffusion model $X$ observed continuously over the interval $[0,T]$ and investigate drift estimation under sparsity constraints. We allow the dimensions of the model and the parameter space to be large. We obtain an oracle inequality for the Lasso estimator and derive an error bound for the $L^2$-distance using concentration inequalities for linear functionals of diffusion processes. The probabilistic part is based upon elements of empirical processes theory and, in particular, on the chaining method.
The purpose of this paper is to analyze a mixed method for linear elasticity eigenvalue problem, which approximates numerically the stress, displacement, and rotation, by piecewise $(k+1)$, $k$ and $(k+1)$-th degree polynomial functions ($k\geq 1$), respectively. The numerical eigenfunction of stress is symmetric. By the discrete $H^1$-stability of numerical displacement, we prove an $O(h^{k+2})$ approximation to the $L^{2}$-orthogonal projection of the eigenspace of exact displacement for the eigenvalue problem, with proper regularity assumption. Thus via postprocessing, we obtain a better approximation to the eigenspace of exact displacement for the eigenproblem than conventional methods. We also prove that numerical approximation to the eigenfunction of stress is locking free with respect to Poisson ratio. We introduce a hybridization to reduce the mixed method to a condensed eigenproblem and prove an $O(h^2)$ initial approximation (independent of the inverse of the elasticity operator) of the eigenvalue for the nonlinear eigenproblem by using the discrete $H^1$-stability of numerical displacement, while only an $O(h)$ approximation can be obtained if we use the traditional inf-sup condition. Finally, we report some numerical experiments.
To improve precision of estimation and power of testing hypothesis for an unconditional treatment effect in randomized clinical trials with binary outcomes, researchers and regulatory agencies recommend using g-computation as a reliable method of covariate adjustment. However, the practical application of g-computation is hindered by the lack of an explicit robust variance formula that can be used for different unconditional treatment effects of interest. To fill this gap, we provide explicit and robust variance estimators for g-computation estimators and demonstrate through simulations that the variance estimators can be reliably applied in practice.
Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest.
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.