The pursuit of long-term fairness involves the interplay between decision-making and the underlying data generating process. In this paper, through causal modeling with a directed acyclic graph (DAG) on the decision-distribution interplay, we investigate the possibility of achieving long-term fairness from a dynamic perspective. We propose Tier Balancing, a technically more challenging but more natural notion to achieve in the context of long-term, dynamic fairness analysis. Different from previous fairness notions that are defined purely on observed variables, our notion goes one step further, capturing behind-the-scenes situation changes on the unobserved latent causal factors that directly carry out the influence from the current decision to the future data distribution. Under the specified dynamics, we prove that in general one cannot achieve the long-term fairness goal only through one-step interventions. Furthermore, in the effort of approaching long-term fairness, we consider the mission of "getting closer to" the long-term fairness goal and present possibility and impossibility results accordingly.
Price discrimination, which refers to the strategy of setting different prices for different customer groups, has been widely used in online retailing. Although it helps boost the collected revenue for online retailers, it might create serious concerns about fairness, which even violates the regulation and laws. This paper studies the problem of dynamic discriminatory pricing under fairness constraints. In particular, we consider a finite selling horizon of length $T$ for a single product with two groups of customers. Each group of customers has its unknown demand function that needs to be learned. For each selling period, the seller determines the price for each group and observes their purchase behavior. While existing literature mainly focuses on maximizing revenue, ensuring fairness among different customers has not been fully explored in the dynamic pricing literature. This work adopts the fairness notion from Cohen et al. (2022). For price fairness, we propose an optimal dynamic pricing policy regarding regret, which enforces the strict price fairness constraint. In contrast to the standard $\sqrt{T}$-type regret in online learning, we show that the optimal regret in our case is $\tilde{O}(T^{4/5})$. We further extend our algorithm to a more general notion of fairness, which includes demand fairness as a special case. To handle this general class, we propose a soft fairness constraint and develop a dynamic pricing policy that achieves $\tilde{O}(T^{4/5})$ regret. We also demonstrate that our algorithmic techniques can be adapted to more general scenarios such as fairness among multiple groups of customers.
This study employs counterfactual explanations to explore "what if?" scenarios in medical research, with the aim of expanding our understanding beyond existing boundaries. Specifically, we focus on utilizing MRI features for diagnosing pediatric posterior fossa brain tumors as a case study. The field of artificial intelligence and explainability has witnessed a growing number of studies and increasing scholarly interest. However, the lack of human-friendly interpretations in explaining the outcomes of machine learning algorithms has significantly hindered the acceptance of these methods by clinicians in their clinical practice. To address this, our approach incorporates counterfactual explanations, providing a novel way to examine alternative decision-making scenarios. These explanations offer personalized and context-specific insights, enabling the validation of predictions and clarification of variations under diverse circumstances. Importantly, our approach maintains both statistical and clinical fidelity, allowing for the examination of distinct tumor features through alternative realities. Additionally, we explore the potential use of counterfactuals for data augmentation and evaluate their feasibility as an alternative approach in medical research. The results demonstrate the promising potential of counterfactual explanations to enhance trust and acceptance of AI-driven methods in clinical settings.
This work introduces a novel framework for dynamic factor model-based data integration of multiple subjects, called GRoup Integrative DYnamic factor models (GRIDY). The framework facilitates the determination of inter-subject differences between two pre-labeled groups by considering a combination of group spatial information and individual temporal dependence. Furthermore, it enables the identification of intra-subject differences over time by employing different model configurations for each subject. Methodologically, the framework combines a novel principal angle-based rank selection algorithm and a non-iterative integrative analysis framework. Inspired by simultaneous component analysis, this approach also reconstructs identifiable latent factor series with flexible covariance structures. The performance of the framework is evaluated through simulations conducted under various scenarios and the analysis of resting-state functional MRI data collected from multiple subjects in both the Autism Spectrum Disorder group and the control group.
This study compares the performance of a causal and a predictive model in modeling travel mode choice in three neighborhoods in Chicago. A causal discovery algorithm and a causal inference technique were used to extract the causal relationships in the mode choice decision making process and to estimate the quantitative causal effects between the variables both directly from observational data. The model results reveal that trip distance and vehicle ownership are the direct causes of mode choice in the three neighborhoods. Artificial neural network models were estimated to predict mode choice. Their accuracy was over 70%, and the SHAP values obtained measure the importance of each variable. We find that both the causal and predictive modeling approaches are useful for the purpose they serve. We also note that the study of mode choice behavior through causal modeling is mostly unexplored, yet it could transform our understanding of the mode choice behavior. Further research is needed to realize the full potential of these techniques in modeling mode choice.
Algorithmic fairness has been a serious concern and received lots of interest in machine learning community. In this paper, we focus on the bipartite ranking scenario, where the instances come from either the positive or negative class and the goal is to learn a ranking function that ranks positive instances higher than negative ones. While there could be a trade-off between fairness and performance, we propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking and maintaining the algorithm classification performance. In particular, we optimize a weighted sum of the utility as identifying an optimal warping path across different protected groups and solve it through a dynamic programming process. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. In addition to binary groups, xOrder can be applied to multiple protected groups. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories. xOrder consistently achieves a better balance between the algorithm utility and ranking fairness on a variety of datasets with different metrics. From the visualization of the calibrated ranking scores, xOrder mitigates the score distribution shifts of different groups compared with baselines. Moreover, additional analytical results verify that xOrder achieves a robust performance when faced with fewer samples and a bigger difference between training and testing ranking score distributions.
Understanding causality helps to structure interventions to achieve specific goals and enables predictions under interventions. With the growing importance of learning causal relationships, causal discovery tasks have transitioned from using traditional methods to infer potential causal structures from observational data to the field of pattern recognition involved in deep learning. The rapid accumulation of massive data promotes the emergence of causal search methods with brilliant scalability. Existing summaries of causal discovery methods mainly focus on traditional methods based on constraints, scores and FCMs, there is a lack of perfect sorting and elaboration for deep learning-based methods, also lacking some considers and exploration of causal discovery methods from the perspective of variable paradigms. Therefore, we divide the possible causal discovery tasks into three types according to the variable paradigm and give the definitions of the three tasks respectively, define and instantiate the relevant datasets for each task and the final causal model constructed at the same time, then reviews the main existing causal discovery methods for different tasks. Finally, we propose some roadmaps from different perspectives for the current research gaps in the field of causal discovery and point out future research directions.
This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.
Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.
Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at //out-of-distribution-generalization.com.
Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.