Modern health care systems are conducting continuous, automated surveillance of the electronic medical record (EMR) to identify adverse events with increasing frequency; however, many events such as sepsis do not have elucidated prodromes (i.e., event chains) that can be used to identify and intercept the adverse event early in its course. Currently, there does not exist reliable framework for discovering or describing causal chains that precede adverse hospital events. Clinically relevant and interpretable results require a framework that can (1) infer temporal interactions across multiple patient features found in EMR data (e.g., labs, vital signs, etc.) and (2) can identify patterns that precede and are specific to an impending adverse event (e.g., sepsis). In this work, we propose a linear multivariate Hawkes process model, coupled with ReLU link function, to recover a Granger Causal (GC) graph with both exciting and inhibiting effects. We develop a scalable two-phase gradient-based method to maximize a surrogate-likelihood and estimate the problem parameters, which is shown to be effective via extensive numerical simulation. Our method is subsequently extended to a data set of patients admitted to an academic level 1 trauma center located in Atalanta, GA, where the estimated GC graph identifies several highly interpretable chains that precede sepsis. Here, we demonstrate the effectiveness of our approach in learning a GC graph over Sepsis Associated Derangements (SADs), but it can be generalized to other applications with similar requirements.
Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the past decades. Among them, causal discovery, learning the causal relations from temporal data, is considered an interesting yet critical task and has attracted much research attention. Existing casual discovery works can be divided into two highly correlated categories according to whether the temporal data is calibrated, ie, multivariate time series casual discovery, and event sequence casual discovery. However, most previous surveys are only focused on the time series casual discovery and ignore the second category. In this paper, we specify the correlation between the two categories and provide a systematical overview of existing solutions. Furthermore, we provide public datasets, evaluation metrics and new perspectives for temporal data casual discovery.
Selection of covariates is crucial in the estimation of average treatment effects given observational data with high or even ultra-high dimensional pretreatment variables. Existing methods for this problem typically assume sparse linear models for both outcome and univariate treatment, and cannot handle situations with ultra-high dimensional covariates. In this paper, we propose a new covariate selection strategy called double screening prior adaptive lasso (DSPAL) to select confounders and predictors of the outcome for multivariate treatments, which combines the adaptive lasso method with the marginal conditional (in)dependence prior information to select target covariates, in order to eliminate confounding bias and improve statistical efficiency. The distinctive features of our proposal are that it can be applied to high-dimensional or even ultra-high dimensional covariates for multivariate treatments, and can deal with the cases of both parametric and nonparametric outcome models, which makes it more robust compared to other methods. Our theoretical analyses show that the proposed procedure enjoys the sure screening property, the ranking consistency property and the variable selection consistency. Through a simulation study, we demonstrate that the proposed approach selects all confounders and predictors consistently and estimates the multivariate treatment effects with smaller bias and mean squared error compared to several alternatives under various scenarios. In real data analysis, the method is applied to estimate the causal effect of a three-dimensional continuous environmental treatment on cholesterol level and enlightening results are obtained.
Recent approaches to causal inference have focused on the identification and estimation of \textit{causal effects}, defined as (properties of) the distribution of counterfactual outcomes under hypothetical actions that alter the nodes of a graphical model. In this article we explore an alternative approach using the concept of \textit{causal influence}, defined through operations that alter the information propagated through the edges of a directed acyclic graph. Causal influence may be more useful than causal effects in settings in which interventions on the causal agents are infeasible or of no substantive interest, for example when considering gender, race, or genetics as a causal agent. Furthermore, the "information transfer" interventions proposed allow us to solve a long-standing problem in causal mediation analysis, namely the non-parametric identification of path-specific effects in the presence of treatment-induced mediator-outcome confounding. We propose efficient non-parametric estimators for a covariance version of the proposed causal influence measures, using data-adaptive regression coupled with semi-parametric efficiency theory to address model misspecification bias while retaining $\sqrt{n}$-consistency and asymptotic normality. We illustrate the use of our methods in two examples using publicly available data.
Genito-Pelvic Pain/Penetration-Disorder (GPPPD) is a common disorder but rarely treated in routine care. Previous research documents that GPPPD symptoms can be treated effectively using internet-based psychological interventions. However, non-response remains common for all state-of-the-art treatments and it is unclear which patient groups are expected to benefit most from an internet-based intervention. Multivariable prediction models are increasingly used to identify predictors of heterogeneous treatment effects, and to allocate treatments with the greatest expected benefits. In this study, we developed and internally validated a multivariable decision tree model that predicts effects of an internet-based treatment on a multidimensional composite score of GPPPD symptoms. Data of a randomized controlled trial comparing the internet-based intervention to a waitlist control group (N =200) was used to develop a decision tree model using model-based recursive partitioning. Model performance was assessed by examining the apparent and bootstrap bias-corrected performance. The final pruned decision tree consisted of one splitting variable, joint dyadic coping, based on which two response clusters emerged. No effect was found for patients with low dyadic coping ($n$=33; $d$=0.12; 95% CI: -0.57-0.80), while large effects ($d$=1.00; 95%CI: 0.68-1.32; $n$=167) are predicted for those with high dyadic coping at baseline. The bootstrap-bias-corrected performance of the model was $R^2$=27.74% (RMSE=13.22).
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. They need to compute node representations recursively from their neighbors. Current GCN training algorithms suffer from either high computational costs that grow exponentially with the number of layers, or high memory usage for loading the entire graph and node embeddings. In this paper, we propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training, hence greatly reducing time and memory complexities. We present theoretical analysis for L-GCN under the graph isomorphism framework, that L-GCN leads to as powerful GCNs as the more costly conventional training algorithm does, under mild conditions. We further propose L^2-GCN, which learns a controller for each layer that can automatically adjust the training epochs per layer in L-GCN. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size, while maintaining comparable prediction performance. With the learned controller, L^2-GCN can further cut the training time in half. Our codes are available at //github.com/Shen-Lab/L2-GCN.
Video anomaly detection under weak labels is formulated as a typical multiple-instance learning problem in previous works. In this paper, we provide a new perspective, i.e., a supervised learning task under noisy labels. In such a viewpoint, as long as cleaning away label noise, we can directly apply fully supervised action classifiers to weakly supervised anomaly detection, and take maximum advantage of these well-developed classifiers. For this purpose, we devise a graph convolutional network to correct noisy labels. Based upon feature similarity and temporal consistency, our network propagates supervisory signals from high-confidence snippets to low-confidence ones. In this manner, the network is capable of providing cleaned supervision for action classifiers. During the test phase, we only need to obtain snippet-wise predictions from the action classifier without any extra post-processing. Extensive experiments on 3 datasets at different scales with 2 types of action classifiers demonstrate the efficacy of our method. Remarkably, we obtain the frame-level AUC score of 82.12% on UCF-Crime.
The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.