Mediation analysis is widely used in health science research to evaluate the extent to which an intermediate variable explains an observed exposure-outcome relationship. However, the validity of analysis can be compromised when the exposure is measured with error. This article investigates the impact of exposure measurement error on assessing mediation with a survival outcome, based on the Cox proportional hazards outcome model. When the outcome is rare and there is no exposure-mediator interaction, we show that the uncorrected estimators of the natural indirect and direct effects can be biased into either direction, but the uncorrected estimator of the mediation proportion is approximately unbiased as long as the measurement error is not large or the mediator-exposure association is not strong. We develop ordinary regression calibration and risk set regression calibration approaches to correct the exposure measurement error-induced bias when estimating mediation effects and allowing for an exposure-mediator interaction in the Cox outcome model. The proposed approaches require a validation study to characterize the measurement error process. We apply the proposed approaches to the Health Professionals Follow-up study to evaluate extent to which reduced body mass index mediates the protective effect of vigorous physical activity on the risk of cardiovascular diseases, and compare the finite-sample properties of the proposed estimators via simulations.
This note presents a refined local approximation for the logarithm of the ratio between the negative multinomial probability mass function and a multivariate normal density, both having the same mean-covariance structure. This approximation, which is derived using Stirling's formula and a meticulous treatment of Taylor expansions, yields an upper bound on the Hellinger distance between the jittered negative multinomial distribution and the corresponding multivariate normal distribution. Upper bounds on the Le Cam distance between negative multinomial and multivariate normal experiments ensue.
In causal inference studies, interest often lies in understanding the mechanisms through which a treatment affects an outcome. One approach is principal stratification (PS), which introduces well-defined causal effects in the presence of confounded post-treatment variables, or mediators, and clearly defines the assumptions for identification and estimation of those effects. The goal of this paper is to extend the PS framework to studies with continuous treatments and continuous post-treatment variables, which introduces a number of unique challenges both in terms of defining causal effects and performing inference. This manuscript provides three key methodological contributions: 1) we introduce novel principal estimands for continuous treatments that provide valuable insights into different causal mechanisms, 2) we utilize Bayesian nonparametric approaches to model the joint distribution of the potential mediating variables based on both Gaussian processes and Dirichlet process mixtures to ensure our approach is robust to model misspecification, and 3) we provide theoretical and numerical justification for utilizing a model for the potential outcomes to identify the joint distribution of the potential mediating variables. Lastly, we apply our methodology to a novel study of the relationship between the economy and arrest rates, and how this is potentially mediated by police capacity.
Federated learning has emerged as a viable distributed solution to train machine learning models without the actual need to share data with the central aggregator. However, standard neural network-based federated learning models have been shown to be susceptible to data leakage from the gradients shared with the server. In this work, we introduce federated learning with variational quantum circuit model built using expressive encoding maps coupled with overparameterized ans\"atze. We show that expressive maps lead to inherent privacy against gradient inversion attacks, while overparameterization ensures model trainability. Our privacy framework centers on the complexity of solving the system of high-degree multivariate Chebyshev polynomials generated by the gradients of quantum circuit. We present compelling arguments highlighting the inherent difficulty in solving these equations, both in exact and approximate scenarios. Additionally, we delve into machine learning-based attack strategies and establish a direct connection between overparameterization in the original federated learning model and underparameterization in the attack model. Furthermore, we provide numerical scaling arguments showcasing that underparameterization of the expressive map in the attack model leads to the loss landscape being swamped with exponentially many spurious local minima points, thus making it extremely hard to realize a successful attack. This provides a strong claim, for the first time, that the nature of quantum machine learning models inherently helps prevent data leakage in federated learning.
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
Out of the participants in a randomized experiment with anticipated heterogeneous treatment effects, is it possible to identify which subjects have a positive treatment effect? While subgroup analysis has received attention, claims about individual participants are much more challenging. We frame the problem in terms of multiple hypothesis testing: each individual has a null hypothesis (stating that the potential outcomes are equal, for example) and we aim to identify those for whom the null is false (the treatment potential outcome stochastically dominates the control one, for example). We develop a novel algorithm that identifies such a subset, with nonasymptotic control of the false discovery rate (FDR). Our algorithm allows for interaction -- a human data scientist (or a computer program) may adaptively guide the algorithm in a data-dependent manner to gain power. We show how to extend the methods to observational settings and achieve a type of doubly-robust FDR control. We also propose several extensions: (a) relaxing the null to nonpositive effects, (b) moving from unpaired to paired samples, and (c) subgroup identification. We demonstrate via numerical experiments and theoretical analysis that the proposed method has valid FDR control in finite samples and reasonably high identification power.
We investigate the combinatorics of max-pooling layers, which are functions that downsample input arrays by taking the maximum over shifted windows of input coordinates, and which are commonly used in convolutional neural networks. We obtain results on the number of linearity regions of these functions by equivalently counting the number of vertices of certain Minkowski sums of simplices. We characterize the faces of such polytopes and obtain generating functions and closed formulas for the number of vertices and facets in a 1D max-pooling layer depending on the size of the pooling windows and stride, and for the number of vertices in a special case of 2D max-pooling.
Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open problem that has not been well studied for weak-label learning. In this paper, we study several sampling strategies that can measure the usefulness of negative instances for weak-label learning and select them accordingly. We test our method on CIFAR-10 and AudioSet datasets and show that it improves the weak-label classification performance and reduces the computational cost compared to random sampling methods. Our work reveals that negative instances are not all equally irrelevant, and selecting them wisely can benefit weak-label learning.
Mediation analysis is widely used for investigating direct and indirect causal pathways through which an effect arises. However, many mediation analysis studies are challenged by missingness in the mediator and outcome. In general, when the mediator and outcome are missing not at random, the direct and indirect effects are not identifiable without further assumptions. In this work, we study the identifiability of the direct and indirect effects under some interpretable mechanisms that allow for missing not at random in the mediator and outcome. We evaluate the performance of statistical inference under those mechanisms through simulation studies and illustrate the proposed methods via the National Job Corps Study.
We propose an approach to compute inner and outer-approximations of the sets of values satisfying constraints expressed as arbitrarily quantified formulas. Such formulas arise for instance when specifying important problems in control such as robustness, motion planning or controllers comparison. We propose an interval-based method which allows for tractable but tight approximations. We demonstrate its applicability through a series of examples and benchmarks using a prototype implementation.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.