Randomized experiments are an excellent tool for estimating internally valid causal effects with the sample at hand, but their external validity is frequently debated. While classical results on the estimation of Population Average Treatment Effects (PATE) implicitly assume random selection into experiments, this is typically far from true in many medical, social-scientific, and industry experiments. When the experimental sample is different from the target sample along observable or unobservable dimensions, experimental estimates may be of limited use for policy decisions. We begin by decomposing the extrapolation bias from estimating the Target Average Treatment Effect (TATE) using the Sample Average Treatment Effect (SATE) into covariate shift, overlap, and effect modification components, which researchers can reason about in order to diagnose the severity of extrapolation bias. Next, We cast covariate shift as a sample selection problem and propose estimators that re-weight the doubly-robust scores from experimental subjects to estimate treatment effects in the overall sample (=: generalization) or in an alternate target sample (=: transportation). We implement these estimators in the open-source R package causalTransportR and illustrate its performance in a simulation study and discuss diagnostics to evaluate its performance.
We employ a general Monte Carlo method to test composite hypotheses of goodness-of-fit for several popular multivariate models that can accommodate both asymmetry and heavy tails. Specifically, we consider weighted L2-type tests based on a discrepancy measure involving the distance between empirical characteristic functions and thus avoid the need for employing corresponding population quantities which may be unknown or complicated to work with. The only requirements of our tests are that we should be able to draw samples from the distribution under test and possess a reasonable method of estimation of the unknown distributional parameters. Monte Carlo studies are conducted to investigate the performance of the test criteria in finite samples for several families of skewed distributions. Real-data examples are also included to illustrate our method.
Randomized Controlled Trials (RCTs) represent a gold standard when developing policy guidelines. However, RCTs are often narrow, and lack data on broader populations of interest. Causal effects in these populations are often estimated using observational datasets, which may suffer from unobserved confounding and selection bias. Given a set of observational estimates (e.g. from multiple studies), we propose a meta-algorithm that attempts to reject observational estimates that are biased. We do so using validation effects, causal effects that can be inferred from both RCT and observational data. After rejecting estimators that do not pass this test, we generate conservative confidence intervals on the extrapolated causal effects for subgroups not observed in the RCT. Under the assumption that at least one observational estimator is asymptotically normal and consistent for both the validation and extrapolated effects, we provide guarantees on the coverage probability of the intervals output by our algorithm. To facilitate hypothesis testing in settings where causal effect transportation across datasets is necessary, we give conditions under which a doubly-robust estimator of group average treatment effects is asymptotically normal, even when flexible machine learning methods are used for estimation of nuisance parameters. We illustrate the properties of our approach on semi-synthetic and real world datasets, and show that it compares favorably to standard meta-analysis techniques.
Distribution-free tests such as the Wilcoxon rank sum test are popular for testing the equality of two univariate distributions. Among the important reasons for their popularity are the striking results of Hodges-Lehmann (1956) and Chernoff-Savage (1958), where the authors show that the asymptotic (Pitman) relative efficiency of Wilcoxon's test with respect to Student's $t$-test, under location-shift alternatives, never falls below $0.864$ (with the identity score) and $1$ (with the Gaussian score) respectively, despite the former being exactly distribution-free for all sample sizes. Motivated by these results, we propose and study a large family of exactly distribution-free multivariate rank-based two-sample tests by leveraging the theory of optimal transport. First, we propose distribution-free analogs of the Hotelling $T^2$ test (the natural multidimensional counterpart of Student's $t$-test) and show that they satisfy Hodges-Lehmann and Chernoff-Savage-type efficiency lower bounds over natural sub-families of multivariate distributions, despite being entirely agnostic to the underlying data generating mechanism -- making them the first multivariate, nonparametric, exactly distribution-free tests that provably achieve such efficiency lower bounds. As these tests are derived from Hotelling $T^2$, naturally they are not universally consistent (same as Wilcoxon's test). To overcome this, we propose exactly distribution-free versions of the celebrated kernel maximum mean discrepancy test and the energy test. These tests are indeed universally consistent under no moment assumptions, exactly distribution-free for all sample sizes, and have non-trivial Pitman efficiency. We believe this trifecta of properties hasn't yet been proven for any existing test in the literature.
The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on each convolution kernel and improves the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameterization, no extra computational costs will be introduced in the inference stage. Moreover, we present Multi-oriented Feature Aggregation (MOFA) which ensembles features extracted from multiple rotated versions of input images and can provide auxiliary information for the training of RKF by leveraging the knowledge distillation strategy. We refer to the distilled RKF model as DRKF. Besides the evaluation on a rotation-augmented version of the public dataset HPatches, we also contribute a new dataset named DiverseBEV which consists of bird's eye view images with large viewpoint changes and camera rotations. Extensive experiments show that our method can outperform other state-of-the-art techniques when exposed to large rotation variations.
In light of newly developed standardization methods, we evaluate, via simulation study, how inverse probability of treatment weighting (IPTW) and standardization-based approaches compare for obtaining estimates of the marginal odds-ratio and the marginal hazards ratio. Specifically, we consider how the two approaches compare in two different scenarios: (1) in a single comparative study (either randomized or non-randomized), and (2) in an anchored indirect treatment comparison of randomized controlled trials (where we compare the matching-adjusted indirect comparison (MAIC) and simulated treatment comparison (STC) methods). We conclude that, in general, standardization-based methods with correctly specified outcome models are more efficient than those based on IPTW. While IPTW is robust to model misspecification in a single comparative study, we find that this is not necessarily the case for MAIC in an indirect treatment comparison.
When estimating an effect of an action with a randomized or observational study, that study is often not a random sample of the desired target population. Instead, estimates from that study can be transported to the target population. However, transportability methods generally rely on a positivity assumption, such that all relevant covariate patterns in the target population are also observed in the study sample. Strict eligibility criteria, particularly in the context of randomized trials, may lead to violations of this assumption. Two common approaches to address positivity violations are restricting the target population and restricting the relevant covariate set. As neither of these restrictions are ideal, we instead propose a synthesis of statistical and simulation models to address positivity violations. We propose corresponding g-computation and inverse probability weighting estimators. The restriction and synthesis approaches to addressing positivity violations are contrasted with a simulation experiment and an illustrative example in the context of sexually transmitted infection testing. In both cases, the proposed model synthesis approach accurately addressed the original research question when paired with a thoughtfully selected simulation model. Neither of the restriction approaches were able to accurately address the motivating question. As public health decisions must often be made with imperfect target population information, model synthesis is a viable approach given a combination of empirical data and external information based on the best available knowledge.
Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest.
Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.
Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.