Identifying a biomarker or treatment-dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. This risk, viewed as a function of thresholds and possibly adjusted for covariates, we call the threshold-response function. Extending the work of Donovan, Hudgens and Gilbert (2019), we propose a nonparametric efficient estimator for the covariate-adjusted threshold-response function, which utilizes machine learning and Targeted Minimum-Loss Estimation (TMLE). We additionally propose a more general estimator, based on sequential regression, that also applies when there is outcome missingness. We show that the threshold-response for a given threshold may be viewed as the expected outcome under a stochastic intervention where all participants are given a treatment dose above the threshold. We prove the estimator is efficient and characterize its asymptotic distribution. A method to construct simultaneous 95% confidence bands for the threshold-response function and its inverse is given. Furthermore, we discuss how to adjust our estimator when the treatment or biomarker is missing-at-random, as is the case in clinical trials with biased sampling designs, using inverse-probability-weighting. The methods are assessed in a diverse set of simulation settings with rare outcomes and cumulative case-control sampling. The methods are employed to estimate neutralizing antibody thresholds for virologically confirmed dengue risk in the CYD14 and CYD15 dengue vaccine trials.
Recent approaches to causal inference have focused on causal effects defined as contrasts between the distribution of counterfactual outcomes under hypothetical interventions on the nodes of a graphical model. In this article we develop theory for causal effects defined with respect to a different type of intervention, one which alters the information propagated through the edges of the graph. These information transfer interventions may be more useful than node interventions in settings in which causes are non-manipulable, for example when considering race or genetics as a causal agent. Furthermore, information transfer interventions allow us to define path-specific decompositions which are identified in the presence of treatment-induced mediator-outcome confounding, a practical problem whose general solution remains elusive. We prove that the proposed effects provide valid statistical tests of mechanisms, unlike popular methods based on randomized interventions on the mediator. We propose efficient non-parametric estimators for a covariance version of the proposed effects, using data-adaptive regression coupled with semi-parametric efficiency theory to address model misspecification bias while retaining $\sqrt{n}$-consistency and asymptotic normality. We illustrate the use of our methods in two examples using publicly available data.
I introduce PRZI (Parameterised-Response Zero Intelligence), a new form of zero-intelligence trader intended for use in simulation studies of the dynamics of continuous double auction markets. Like Gode & Sunder's classic ZIC trader, PRZI generates quote-prices from a random distribution over some specified domain of allowable quote-prices. Unlike ZIC, which uses a uniform distribution to generate prices, the probability distribution in a PRZI trader is parameterised in such a way that its probability mass function (PMF) is determined by a real-valued control variable s in the range [-1.0, +1.0] that determines the _strategy_ for that trader. When s=0, a PRZI trader is identical to ZIC, with a uniform PMF; but when |s|=~1 the PRZI trader's PMF becomes maximally skewed to one extreme or the other of the price-range, thereby making its quote-prices more or less urgent, biasing the quote-price distribution toward or away from the trader's limit-price. To explore the co-evolutionary dynamics of populations of PRZI traders that dynamically adapt their strategies, I show results from long-term market experiments in which each trader uses a simple stochastic hill-climber algorithm to repeatedly evaluate alternative s-values and choose the most profitable at any given time. In these experiments the profitability of any particular s-value may be non-stationary because the profitability of one trader's strategy at any one time can depend on the mix of strategies being played by the other traders at that time, which are each themselves continuously adapting. Results from these market experiments demonstrate that the population of traders' strategies can exhibit rich dynamics, with periods of stability lasting over hundreds of thousands of trader interactions interspersed by occasional periods of change. Python source-code for the work reported here has been made publicly available on GitHub.
Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.
Marginalized groups are exposed to disproportionately high levels of air pollution. In this context, robust evaluations of the heterogeneous health impacts of air pollution regulations are key to justifying and designing maximally protective future interventions. Such evaluations are complicated by two key issues: 1) much of air pollution regulatory policy is focused on intervening on large emissions generators while resulting health impacts are measured in exposed populations; 2) due to air pollution transport, an intervention on one emissions generator can impact geographically distant communities. In causal inference, such a scenario has been described as that of bipartite network interference (BNI). To our knowledge, no literature to date has considered how to estimate heterogeneous causal effects with BNI. First, we propose, implement, and evaluate causal estimators for subgroup-specific treatment effects via augmented inverse propensity weighting and G-computation methods in the context of BNI. Second, we design and implement an empirical Monte Carlo simulation approach for BNI through which we evaluate the performance of the proposed estimators. Third, we use the proposed methods to estimate the causal effects of flue gas desulfurization scrubber installations on coal-fired power plants on ischemic heart disease hospitalizations among 27,312,190 Medicare beneficiaries residing across 29,034 U.S. ZIP codes. While we find no statistically significant effect of scrubbers in the full population, we do find protective effects in marginalized groups. For high-poverty and predominantly non-white ZIP codes, scrubber installations at their most influential power plants, when less-influential plants are untreated, are found to result in statistically significant decreases in IHD hospitalizations, with reduction ranging from 6.4 to 43.1 hospitalizations per 10,000 person-years.
As data-driven methods are deployed in real-world settings, the processes that generate the observed data will often react to the decisions of the learner. For example, a data source may have some incentive for the algorithm to provide a particular label (e.g. approve a bank loan), and manipulate their features accordingly. Work in strategic classification and decision-dependent distributions seeks to characterize the closed-loop behavior of deploying learning algorithms by explicitly considering the effect of the classifier on the underlying data distribution. More recently, works in performative prediction seek to classify the closed-loop behavior by considering general properties of the mapping from classifier to data distribution, rather than an explicit form. Building on this notion, we analyze repeated risk minimization as the perturbed trajectories of the gradient flows of performative risk minimization. We consider the case where there may be multiple local minimizers of performative risk, motivated by situations where the initial conditions may have significant impact on the long-term behavior of the system. We provide sufficient conditions to characterize the region of attraction for the various equilibria in this settings. Additionally, we introduce the notion of performative alignment, which provides a geometric condition on the convergence of repeated risk minimization to performative risk minimizers.
Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Most research has focused on estimating the conditional average treatment effect (CATE). However, identification of the CATE requires all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealistic in practice. Instead, we propose conditional effects based on incremental propensity score interventions, which are stochastic interventions where the odds of treatment are multiplied by some factor. These effects do not require positivity for identification and can be better suited for modeling scenarios in which people cannot be forced into treatment. We develop a projection estimator and a flexible nonparametric estimator that can each estimate all the conditional effects we propose and derive model-agnostic error guarantees showing both estimators satisfy a form of double robustness. Further, we propose a summary of treatment effect heterogeneity and a test for any effect heterogeneity based on the variance of a conditional derivative effect and derive a nonparametric estimator that also satisfies a form of double robustness. Finally, we demonstrate our estimators by analyzing the effect of intensive care unit admission on mortality using a dataset from the (SPOT)light study.
Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.
We address a task of estimating sparse coefficients in linear regression when the covariates are drawn from an $L$-subexponential random vector, which belongs to a class of distributions having heavier tails than a Gaussian random vector. Prior works have tackled this issue by assuming that the covariates are drawn from an $L$-subexponential random vector and have established error bounds that resemble those derived for Gaussian random vectors. However, these previous methods require stronger conditions to derive error bounds than those employed for Gaussian random vectors. In the present paper, we present an error bound identical to that obtained for Gaussian random vectors, up to constant factors, without requiring stronger conditions, even when the covariates are drawn from an $L$-subexponential random vector. Somewhat interestingly, we utilize an $\ell_1$-penalized Huber regression, that is recognized for its robustness to heavy-tailed random noises, not covariates. We believe that the present paper reveals a new aspect of the $\ell_1$-penalized Huber regression.
Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lead to policies that perform suboptimally on the target population. We consider a model where observable attributes can impact sample selection probabilities arbitrarily but the effect of unobservable attributes is bounded by a constant, and we aim to learn policies with the best possible performance guarantees that hold under any sampling bias of this type. In particular, we derive the partial identification result for the worst-case welfare in the presence of sampling bias and show that the optimal max-min, max-min gain, and minimax regret policies depend on both the conditional average treatment effect (CATE) and the conditional value-at-risk (CVaR) of potential outcomes given covariates. To avoid finite-sample inefficiencies of plug-in estimates, we further provide an end-to-end procedure for learning the optimal max-min and max-min gain policies that does not require the separate estimation of nuisance parameters.
Reduced Order Models (ROMs) are of considerable importance in many areas of engineering in which computational time presents difficulties. Established approaches employ projection-based reduction such as Proper Orthogonal Decomposition, however, such methods can become inefficient or fail in the case of parameteric or strongly nonlinear models. Such limitations are usually tackled via a library of local reduction bases each of which being valid for a given parameter vector. The success of such methods, however, is strongly reliant upon the method used to relate the parameter vectors to the local bases, this is typically achieved using clustering or interpolation methods. We propose the replacement of these methods with a Variational Autoencoder (VAE) to be used as a generative model which can infer the local basis corresponding to a given parameter vector in a probabilistic manner. The resulting VAE-boosted parametric ROM \emph{VpROM} still retains the physical insights of a projection-based method but also allows for better treatment of problems where model dependencies or excitation traits cause the dynamic behavior to span multiple response regimes. Moreover, the probabilistic treatment of the VAE representation allows for uncertainty quantification on the reduction bases which may then be propagated to the ROM response. The performance of the proposed approach is validated on an open-source simulation benchmark featuring hysteresis and multi-parametric dependencies, and on a large-scale wind turbine tower characterised by nonlinear material behavior and model uncertainty.