When the marginal causal effect comparing the same treatment pair is available from multiple trials, we wish to transport all results to make inference on the target population effect. To account for the differences between populations, statistical analysis is often performed controlling for relevant variables. However, when transportability assumptions are placed on conditional causal effects, rather than the distribution of potential outcomes, we need to carefully choose these effect measures. In particular, we present identifiability results in two cases: target population average treatment effect for a continuous outcome and causal mean ratio for a positive outcome. We characterize the semiparametric efficiency bounds of the causal effects under the respective transportability assumptions and propose estimators that are doubly robust against model misspecifications. We highlight an important discussion on the tension between the non-collapsibility of conditional effects and the variational independence induced by transportability in the case of multiple source trials.
Behaviour change lies at the heart of many observable collective phenomena such as the transmission and control of infectious diseases, adoption of public health policies, and migration of animals to new habitats. Representing the process of individual behaviour change in computer simulations of these phenomena remains an open challenge. Often, computational models use phenomenological implementations with limited support from behavioural data. Without a strong connection to observable quantities, such models have limited utility for simulating observed and counterfactual scenarios of emergent phenomena because they cannot be validated or calibrated. Here, we present a simple stochastic individual-based model of reversal learning that captures fundamental properties of individual behaviour change, namely, the capacity to learn based on accumulated reward signals, and the transient persistence of learned behaviour after rewards are removed or altered. The model has only two parameters, and we use approximate Bayesian computation to demonstrate that they are fully identifiable from empirical reversal learning time series data. Finally, we demonstrate how the model can be extended to account for the increased complexity of behavioural dynamics over longer time scales involving fluctuating stimuli. This work is a step towards the development and evaluation of fully identifiable individual-level behaviour change models that can function as validated submodels for complex simulations of collective behaviour change.
We propose a test for the identification of causal effects in mediation and dynamic treatment models that is based on two sets of observed variables, namely covariates to be controlled for and suspected instruments, building on the test by Huber and Kueck (2022) for single treatment models. We consider models with a sequential assignment of a treatment and a mediator to assess the direct treatment effect (net of the mediator), the indirect treatment effect (via the mediator), or the joint effect of both treatment and mediator. We establish testable conditions for identifying such effects in observational data. These conditions jointly imply (1) the exogeneity of the treatment and the mediator conditional on covariates and (2) the validity of distinct instruments for the treatment and the mediator, meaning that the instruments do not directly affect the outcome (other than through the treatment or mediator) and are unconfounded given the covariates. Our framework extends to post-treatment sample selection or attrition problems when replacing the mediator by a selection indicator for observing the outcome, enabling joint testing of the selectivity of treatment and attrition. We propose a machine learning-based test to control for covariates in a data-driven manner and analyze its finite sample performance in a simulation study. Additionally, we apply our method to Slovak labor market data and find that our testable implications are not rejected for a sequence of training programs typically considered in dynamic treatment evaluations.
Neighborhood disadvantage is associated with worse health and cognitive outcomes. Morphological similarity network (MSN) is a promising approach to elucidate cortical network patterns underlying complex cognitive functions. We hypothesized that MSNs could capture changes in cortical patterns related to neighborhood disadvantage and cognitive function. This cross-sectional study included cognitively unimpaired participants from two large Alzheimers studies at University of Wisconsin-Madison. Neighborhood disadvantage status was obtained using the Area Deprivation Index (ADI). Cognitive performance was assessed on memory, processing speed and executive function. Morphological Similarity Networks (MSN) were constructed for each participant based on the similarity in distribution of cortical thickness of brain regions, followed by computation of local and global network features. Association of ADI with cognitive scores and MSN features were examined using linear regression and mediation analysis. ADI showed negative association with category fluency,implicit learning speed, story recall and modified pre-clinical Alzheimers cognitive composite scores, indicating worse cognitive function among those living in more disadvantaged neighborhoods. Local network features of frontal and temporal regions differed based on ADI status. Centrality of left lateral orbitofrontal region showed a partial mediating effect between association of neighborhood disadvantage and story recall performance. Our preliminary findings suggest differences in local cortical organization by neighborhood disadvantage, which partially mediated the relationship between ADI and cognitive performance, providing a possible network-based mechanism to, in-part, explain the risk for poor cognitive functioning associated with disadvantaged neighborhoods.
Difference-in-differences (DID) is a popular approach to identify the causal effects of treatments and policies in the presence of unmeasured confounding. DID identifies the sample average treatment effect in the treated (SATT). However, a goal of such research is often to inform decision-making in target populations outside the treated sample. Transportability methods have been developed to extend inferences from study samples to external target populations; these methods have primarily been developed and applied in settings where identification is based on conditional independence between the treatment and potential outcomes, such as in a randomized trial. We present a novel approach to identifying and estimating effects in a target population, based on DID conducted in a study sample that differs from the target population. We present a range of assumptions under which one may identify causal effects in the target population and employ causal diagrams to illustrate these assumptions. In most realistic settings, results depend critically on the assumption that any unmeasured confounders are not effect measure modifiers on the scale of the effect of interest (e.g., risk difference, odds ratio). We develop several estimators of transported effects, including g-computation, inverse odds weighting, and a doubly robust estimator based on the efficient influence function. Simulation results support theoretical properties of the proposed estimators. As an example, we apply our approach to study the effects of a 2018 US federal smoke-free public housing law on air quality in public housing across the US, using data from a DID study conducted in New York City alone.
Due to their flexibility and theoretical tractability Gaussian process (GP) regression models have become a central topic in modern statistics and machine learning. While the true posterior in these models is given explicitly, numerical evaluations depend on the inversion of the augmented kernel matrix $ K + \sigma^2 I $, which requires up to $ O(n^3) $ operations. For large sample sizes n, which are typically given in modern applications, this is computationally infeasible and necessitates the use of an approximate version of the posterior. Although such methods are widely used in practice, they typically have very limtied theoretical underpinning. In this context, we analyze a class of recently proposed approximation algorithms from the field of Probabilistic numerics. They can be interpreted in terms of Lanczos approximate eigenvectors of the kernel matrix or a conjugate gradient approximation of the posterior mean, which are particularly advantageous in truly large scale applications, as they are fundamentally only based on matrix vector multiplications amenable to the GPU acceleration of modern software frameworks. We combine result from the numerical analysis literature with state of the art concentration results for spectra of kernel matrices to obtain minimax contraction rates. Our theoretical findings are illustrated by numerical experiments.
Making inference with spatial extremal dependence models can be computationally burdensome since they involve intractable and/or censored likelihoods. Building on recent advances in likelihood-free inference with neural Bayes estimators, that is, neural networks that approximate Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that {use data augmentation techniques} to encode censoring information in the neural network {input}. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference methods for spatial extremal dependence models. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators to make inference with popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess extreme particulate matter 2.5 microns or less in diameter (${\rm PM}_{2.5}$) concentration over the whole of Saudi Arabia.
Low-Rank Tensor Completion, a method which exploits the inherent structure of tensors, has been studied extensively as an effective approach to tensor completion. Whilst such methods attained great success, none have systematically considered exploiting the numerical priors of tensor elements. Ignoring numerical priors causes loss of important information regarding the data, and therefore prevents the algorithms from reaching optimal accuracy. Despite the existence of some individual works which consider ad hoc numerical priors for specific tasks, no generalizable frameworks for incorporating numerical priors have appeared. We present the Generalized CP Decomposition Tensor Completion (GCDTC) framework, the first generalizable framework for low-rank tensor completion that takes numerical priors of the data into account. We test GCDTC by further proposing the Smooth Poisson Tensor Completion (SPTC) algorithm, an instantiation of the GCDTC framework, whose performance exceeds current state-of-the-arts by considerable margins in the task of non-negative tensor completion, exemplifying GCDTC's effectiveness. Our code is open-source.
A principled approach to cyclicality and intransitivity in cardinal paired comparison data is developed within the framework of graphical linear models. Fundamental to our developments is a detailed understanding and study of the parameter space which accommodates cyclicality and intransitivity. In particular, the relationships between the reduced, completely transitive model, the full, not necessarily transitive model, and all manner of intermediate models are explored for both complete and incomplete paired comparison graphs. It is shown that identifying cyclicality and intransitivity reduces to a model selection problem and a new method for model selection employing geometrical insights, unique to the problem at hand, is proposed. The large sample properties of the estimators as well as guarantees on the selected model are provided. It is thus shown that in large samples all cyclicalities and intransitivities can be identified. The method is exemplified using simulations and the analysis of an illustrative example.
We use fixed point theory to analyze nonnegative neural networks, which we define as neural networks that map nonnegative vectors to nonnegative vectors. We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and (weakly) scalable mappings within the framework of nonlinear Perron-Frobenius theory. This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks having inputs and outputs of the same dimension, and these conditions are weaker than those recently obtained using arguments in convex analysis. Furthermore, we prove that the shape of the fixed point set of nonnegative neural networks with nonnegative weights and biases is an interval, which under mild conditions degenerates to a point. These results are then used to obtain the existence of fixed points of more general nonnegative neural networks. From a practical perspective, our results contribute to the understanding of the behavior of autoencoders, and we also offer valuable mathematical machinery for future developments in deep equilibrium models.
Randomized controlled trials (RCTs) are used to evaluate treatment effects. When individuals are grouped together, clustered RCTs are conducted. Stratification is recommended to reduce imbalance of baseline covariates between treatment and control. In practice, this can lead to comparisons between clusters of very different sizes. As a result, direct adjustment estimators that average differences of means within the strata may be inconsistent. We study differences of inverse probability weighted means of a treatment and a control group -- H\'ajek effect estimators -- under two common forms of stratification: small strata that increase in number; or larger strata with growing numbers of clusters in each. Under either scenario, mild conditions give consistency and asymptotic Normality. We propose a variance estimator applicable to designs with any number of strata and strata of any size. We describe a special use of the variance estimator that improves small sample performance of Wald-type confidence intervals. The H\'ajek effect estimator lends itself to covariance adjustment, and our variance estimator remains applicable. Simulations and real-world applications in children's nutrition and education confirm favorable operating characteristics, demonstrating advantages of the H\'ajek effect estimator beyond its simplicity and ease of use.