This paper studies statistical decisions for dynamic treatment assignment problems. Many policies involve dynamics in their treatment assignments where treatments are sequentially assigned to individuals across multiple stages and the effect of treatment at each stage is usually heterogeneous with respect to the prior treatments, past outcomes, and observed covariates. We consider estimating an optimal dynamic treatment rule that guides the optimal treatment assignment for each individual at each stage based on the individual's history. This paper proposes an empirical welfare maximization approach in a dynamic framework. The approach estimates the optimal dynamic treatment rule from panel data taken from an experimental or quasi-experimental study. The paper proposes two estimation methods: one solves the treatment assignment problem at each stage through backward induction, and the other solves the whole dynamic treatment assignment problem simultaneously across all stages. We derive finite-sample upper bounds on the worst-case average welfare-regrets for the proposed methods and show $n^{-1/2}$-minimax convergence rates. We also modify the simultaneous estimation method to incorporate intertemporal budget/capacity constraints.
This paper proposes a two-fold factor model for high-dimensional functional time series (HDFTS), which enables the modeling and forecasting of multi-population mortality under the functional data framework. The proposed model first decomposes the HDFTS into functional time series with lower dimensions (common feature) and a system of basis functions specific to different cross-sections (heterogeneity). Then the lower-dimensional common functional time series are further reduced into low-dimensional scalar factor matrices. The dimensionally reduced factor matrices can reasonably convey useful information in the original HDFTS. All the temporal dynamics contained in the original HDFTS are extracted to facilitate forecasting. The proposed model can be regarded as a general case of several existing functional factor models. Through a Monte Carlo simulation, we demonstrate the performance of the proposed method in model fitting. In an empirical study of the Japanese subnational age-specific mortality rates, we show that the proposed model produces more accurate point and interval forecasts in modeling multi-population mortality than those existing functional factor models. The financial impact of the improvements in forecasts is demonstrated through comparisons in life annuity pricing practices.
In this paper, we develop a general framework to design differentially private expectation-maximization (EM) algorithms in high-dimensional latent variable models, based on the noisy iterative hard-thresholding. We derive the statistical guarantees of the proposed framework and apply it to three specific models: Gaussian mixture, mixture of regression, and regression with missing covariates. In each model, we establish the near-optimal rate of convergence with differential privacy constraints, and show the proposed algorithm is minimax rate optimal up to logarithm factors. The technical tools developed for the high-dimensional setting are then extended to the classic low-dimensional latent variable models, and we propose a near rate-optimal EM algorithm with differential privacy guarantees in this setting. Simulation studies and real data analysis are conducted to support our results.
In this paper, we consider the density estimation problem associated with the stationary measure of ergodic It\^o diffusions from a discrete-time series that approximate the solutions of the stochastic differential equations. To take an advantage of the characterization of density function through the stationary solution of a parabolic-type Fokker-Planck PDE, we proceed as follows. First, we employ deep neural networks to approximate the drift and diffusion terms of the SDE by solving appropriate supervised learning tasks. Subsequently, we solve a steady-state Fokker-Plank equation associated with the estimated drift and diffusion coefficients with a neural-network-based least-squares method. We establish the convergence of the proposed scheme under appropriate mathematical assumptions, accounting for the generalization errors induced by regressing the drift and diffusion coefficients, and the PDE solvers. This theoretical study relies on a recent perturbation theory of Markov chain result that shows a linear dependence of the density estimation to the error in estimating the drift term, and generalization error results of nonparametric regression and of PDE regression solution obtained with neural-network models. The effectiveness of this method is reflected by numerical simulations of a two-dimensional Student's t distribution and a 20-dimensional Langevin dynamics.
We present new estimators for the statistical analysis of the dependence of the mean gap time length between consecutive recurrent events, on a set of explanatory random variables and in the presence of right censoring. The dependence is expressed through regression-like and overdispersion parameters, estimated via conditional estimating equations. The mean and variance of the length of each gap time, conditioned on the observed history of prior events and other covariates, are known functions of parameters and covariates. Under certain conditions on censoring, we construct normalized estimating functions that are asymptotically unbiased and contain only observed data. We discuss the existence, consistency and asymptotic normality of a sequence of estimators of the parameters, which are roots of these estimating equations. Simulations suggest that our estimators could be used successfully with a relatively small sample size in a study of short duration.
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river.
When interested in a time-to-event outcome, competing events that prevent the occurrence of the event of interest may be present. In the presence of competing events, various statistical estimands have been suggested for defining the causal effect of treatment on the event of interest. Depending on the estimand, the competing events are either accommodated or eliminated, resulting in causal effects with different interpretation. The former approach captures the total effect of treatment on the event of interest while the latter approach captures the direct effect of treatment on the event of interest that is not mediated by the competing event. Separable effects have also been defined for settings where the treatment effect can be partitioned into its effect on the event of interest and its effect on the competing event through different causal pathways. We outline various causal effects that may be of interest in the presence of competing events, including total, direct and separable effects, and describe how to obtain estimates using regression standardisation with the Stata command standsurv. Regression standardisation is applied by obtaining the average of individual estimates across all individuals in a study population after fitting a survival model. With standsurv several contrasts of interest can be calculated including differences, ratios and other user-defined functions. Confidence intervals can also be obtained using the delta method. Throughout we use an example analysing a publicly available dataset on prostate cancer to allow the reader to replicate the analysis and further explore the different effects of interest.
This paper studies treatment effect models in which individuals are classified into unobserved groups based on heterogeneous treatment rules. Using a finite mixture approach, we propose a marginal treatment effect (MTE) framework in which the treatment choice and outcome equations can be heterogeneous across groups. Under the availability of instrumental variables specific to each group, we show that the MTE for each group can be separately identified. Based on our identification result, we propose a two-step semiparametric procedure for estimating the group-wise MTE. We illustrate the usefulness of the proposed method with an application to economic returns to college education.
Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken. Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. Hence being able to calibrate these models, or enforce calibration while learning them, has regained interest in recent literature. In this context, properly assessing calibration is paramount to quantify new contributions tackling calibration. However, there is room for improvement for commonly used metrics and evaluation of calibration could benefit from deeper analyses. Thus this paper focuses on the empirical evaluation of calibration metrics in the context of classification. More specifically it evaluates different estimators of the Expected Calibration Error ($ECE$), amongst which legacy estimators and some novel ones, proposed in this paper. We build an empirical procedure to quantify the quality of these $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.
In this paper, we show how a dynamic population game can model the strategic interaction and migration decisions made by a large population of agents in response to epidemic prevalence. Specifically, we consider a modified susceptible-asymptomatic-infected-recovered (SAIR) epidemic model over multiple zones. Agents choose whether to activate (i.e., interact with others), how many other agents to interact with, and which zone to move to in a time-scale which is comparable with the epidemic evolution. We define and analyze the notion of equilibrium in this game, and investigate the transient behavior of the epidemic spread in a range of numerical case studies, providing insights on the effects of the agents' degree of future awareness, strategic migration decisions, as well as different levels of lockdown and other interventions. One of our key findings is that the strategic behavior of agents plays an important role in the progression of the epidemic and can be exploited in order to design suitable epidemic control measures.
In many applications, to estimate a parameter or quantity of interest psi, a finite-dimensional nuisance parameter theta is estimated first. For example, many estimators in causal inference depend on the propensity score: the probability of (possibly time-dependent) treatment given the past. theta is often estimated in a first step, which can affect the variance of the estimator for psi. theta is often estimated by maximum (partial) likelihood. Inverse Probability Weighting, Marginal Structural Models and Structural Nested Models are well-known causal inference examples, where one often posits a (pooled) logistic regression model for the treatment (initiation) and/or censoring probabilities, and estimates these with standard software, so by maximum partial likelihood. Inverse Probability Weighting, Marginal Structural Models and Structural Nested Models have something else in common: they can all be shown to be based on unbiased estimating equations. This paper has four main results for estimators psi-hat based on unbiased estimating equations including theta. First, it shows that the true limiting variance of psi-hat is smaller or remains the same when theta is estimated by solving (partial) score equations, compared to if theta were known and plugged in. Second, it shows that if estimating theta using (partial) score equations is ignored, the resulting sandwich estimator for the variance of psi-hat is conservative. Third, it provides a variance correction. Fourth, it shows that if the estimator psi-hat with the true theta plugged in is efficient, the true limiting variance of psi-hat does not depend on whether or not theta is estimated, and the sandwich estimator for the variance of psi-hat ignoring estimation of theta is consistent. These findings hold in semiparametric and parametric settings where the parameters of interest psi are estimated based on unbiased estimating equations.