Semiparametric inference on average causal effects from observational data is based on assumptions yielding identification of the effects. In practice, several distinct identifying assumptions may be plausible; an analyst has to make a delicate choice between these models. In this paper, we study three identifying assumptions based on the potential outcome framework: the back-door assumption, which uses pre-treatment covariates, the front-door assumption, which uses mediators, and the two-door assumption using pre-treatment covariates and mediators simultaneously. We provide the efficient influence functions and the corresponding semiparametric efficiency bounds that hold under these assumptions, and their combinations. We demonstrate that neither of the identification models provides uniformly the most efficient estimation and give conditions under which some bounds are lower than others. We show when semiparametric estimating equation estimators based on influence functions attain the bounds, and study the robustness of the estimators to misspecification of the nuisance models. The theory is complemented with simulation experiments on the finite sample behavior of the estimators. The results obtained are relevant for an analyst facing a choice between several plausible identifying assumptions and corresponding estimators. Our results show that this choice implies a trade-off between efficiency and robustness to misspecification of the nuisance models.
Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.
Bayesian nonparametric hierarchical priors provide flexible models for sharing of information within and across groups. We focus on latent feature allocation models, where the data structures correspond to multisets or unbounded sparse matrices. The fundamental development in this regard is the Hierarchical Indian Buffet process (HIBP), devised by Thibaux and Jordan (2007). However, little is known in terms of explicit tractable descriptions of the joint, marginal, posterior and predictive distributions of the HIBP. We provide explicit novel descriptions of these quantities, in the Bernoulli HIBP and general spike and slab HIBP settings, which allows for exact sampling and simpler practical implementation. We then extend these results to the more complex setting of hierarchies of general HIBP (HHIBP). The generality of our framework allows one to recognize important structure that may otherwise be masked in the Bernoulli setting, and involves characterizations via dynamic mixed Poisson random count matrices. Our analysis shows that the standard choice of hierarchical Beta processes for modeling across group sharing is not ideal in the classic Bernoulli HIBP setting proposed by Thibaux and Jordan (2007), or other spike and slab HIBP settings, and we thus indicate tractable alternative priors.
A predictive model makes outcome predictions based on some given features, i.e., it estimates the conditional probability of the outcome given a feature vector. In general, a predictive model cannot estimate the causal effect of a feature on the outcome, i.e., how the outcome will change if the feature is changed while keeping the values of other features unchanged. This is because causal effect estimation requires interventional probabilities. However, many real world problems such as personalised decision making, recommendation, and fairness computing, need to know the causal effect of any feature on the outcome for a given instance. This is different from the traditional causal effect estimation problem with a fixed treatment variable. This paper first tackles the challenge of estimating the causal effect of any feature (as the treatment) on the outcome w.r.t. a given instance. The theoretical results naturally link a predictive model to causal effect estimations and imply that a predictive model is causally interpretable when the conditions identified in the paper are satisfied. The paper also reveals the robust property of a causally interpretable model. We use experiments to demonstrate that various types of predictive models, when satisfying the conditions identified in this paper, can estimate the causal effects of features as accurately as state-of-the-art causal effect estimation methods. We also show the potential of such causally interpretable predictive models for robust predictions and personalised decision making.
Sensitivity analysis for the unconfoundedness assumption is a crucial component of observational studies. The marginal sensitivity model has become increasingly popular for this purpose due to its interpretability and mathematical properties. As the basis of $L^\infty$-sensitivity analysis, it assumes the logit difference between the observed and full data propensity scores is uniformly bounded. In this article, we introduce a new $L^2$-sensitivity analysis framework which is flexible, sharp and efficient. We allow the strength of unmeasured confounding to vary across units and only require it to be bounded marginally for partial identification. We derive analytical solutions to the optimization problems under our $L^2$-models, which can be used to obtain sharp bounds for the average treatment effect (ATE). We derive efficient influence functions and use them to develop efficient one-step estimators in both analyses. We show that multiplier bootstrap can be applied to construct simultaneous confidence bands for our ATE bounds. In a real-data study, we demonstrate that $L^2$-analysis relaxes the interpretation of $L^\infty$-analysis and provides a much more reliable calibration process using observed covariates. Finally, we provide an extension of our theoretical results to the conditional average treatment effect (CATE).
Medical image segmentation is a challenging task with inherent ambiguity and high uncertainty, attributed to factors such as unclear tumor boundaries and multiple plausible annotations. The accuracy and diversity of segmentation masks are both crucial for providing valuable references to radiologists in clinical practice. While existing diffusion models have shown strong capacities in various visual generation tasks, it is still challenging to deal with discrete masks in segmentation. To achieve accurate and diverse medical image segmentation masks, we propose a novel conditional Bernoulli Diffusion model for medical image segmentation (BerDiff). Instead of using the Gaussian noise, we first propose to use the Bernoulli noise as the diffusion kernel to enhance the capacity of the diffusion model for binary segmentation tasks, resulting in more accurate segmentation masks. Second, by leveraging the stochastic nature of the diffusion model, our BerDiff randomly samples the initial Bernoulli noise and intermediate latent variables multiple times to produce a range of diverse segmentation masks, which can highlight salient regions of interest that can serve as valuable references for radiologists. In addition, our BerDiff can efficiently sample sub-sequences from the overall trajectory of the reverse diffusion, thereby speeding up the segmentation process. Extensive experimental results on two medical image segmentation datasets with different modalities demonstrate that our BerDiff outperforms other recently published state-of-the-art methods. Our results suggest diffusion models could serve as a strong backbone for medical image segmentation.
Although many fairness criteria have been proposed to ensure that machine learning algorithms do not exhibit or amplify our existing social biases, these algorithms are trained on datasets that can themselves be statistically biased. In this paper, we investigate the robustness of a number of existing (demographic) fairness criteria when the algorithm is trained on biased data. We consider two forms of dataset bias: errors by prior decision makers in the labeling process, and errors in measurement of the features of disadvantaged individuals. We analytically show that some constraints (such as Demographic Parity) can remain robust when facing certain statistical biases, while others (such as Equalized Odds) are significantly violated if trained on biased data. We also analyze the sensitivity of these criteria and the decision maker's utility to biases. We provide numerical experiments based on three real-world datasets (the FICO, Adult, and German credit score datasets) supporting our analytical findings. Our findings present an additional guideline for choosing among existing fairness criteria, or for proposing new criteria, when available datasets may be biased.
Modern datasets commonly feature both substantial missingness and many variables of mixed data types, which present significant challenges for estimation and inference. Complete case analysis, which proceeds using only the observations with fully-observed variables, is often severely biased, while model-based imputation of missing values is limited by the ability of the model to capture complex dependencies among (possibly many) variables of mixed data types. To address these challenges, we develop a novel Bayesian mixture copula for joint and nonparametric modeling of multivariate count, continuous, ordinal, and unordered categorical variables, and deploy this model for inference, prediction, and imputation of missing data. Most uniquely, we introduce a new and computationally efficient strategy for marginal distribution estimation that eliminates the need to specify any marginal models yet delivers posterior consistency for each marginal distribution and the copula parameters under missingness-at-random. Extensive simulation studies demonstrate exceptional modeling and imputation capabilities relative to competing methods, especially with mixed data types, complex missingness mechanisms, and nonlinear dependencies. We conclude with a data analysis that highlights how improper treatment of missing data can distort a statistical analysis, and how the proposed approach offers a resolution.
Making causal inferences from observational studies can be challenging when confounders are missing not at random. In such cases, identifying causal effects is often not guaranteed. Motivated by a real example, we consider a treatment-independent missingness assumption under which we establish the identification of causal effects when confounders are missing not at random. We propose a weighted estimating equation (WEE) approach for estimating model parameters and introduce three estimators for the average causal effect, based on regression, propensity score weighting, and doubly robust estimation. We evaluate the performance of these estimators through simulations, and provide a real data analysis to illustrate our proposed method.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.