A standard assumption for causal inference about the joint effects of time-varying treatment is that one has measured sufficient covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values, also known as "sequential randomization assumption (SRA)". SRA is often criticized as it requires one to accurately measure all confounders. Realistically, measured covariates can rarely capture all confounders with certainty. Often covariate measurements are at best proxies of confounders, thus invalidating inferences under SRA. In this paper, we extend the proximal causal inference (PCI) framework of Miao et al. (2018) to the longitudinal setting under a semiparametric marginal structural mean model (MSMM). PCI offers an opportunity to learn about joint causal effects in settings where SRA based on measured time-varying covariates fails, by formally accounting for the covariate measurements as imperfect proxies of underlying confounding mechanisms. We establish nonparametric identification with a pair of time-varying proxies and provide a corresponding characterization of regular and asymptotically linear estimators of the parameter indexing the MSMM, including a rich class of doubly robust estimators, and establish the corresponding semiparametric efficiency bound for the MSMM. Extensive simulation studies and a data application illustrate the finite sample behavior of proposed methods.
Predicting patient survival probabilities based on observed covariates is an important assessment in clinical practice. These patient-specific covariates are often measured over multiple follow-up appointments. It is then of interest to predict survival based on the history of these longitudinal measurements, and to update predictions as more observations become available. The standard approaches to these so-called `dynamic prediction' assessments are joint models and landmark analysis. Joint models involve high-dimensional parametrisations, and their computational complexity often prohibits including multiple longitudinal covariates. Landmark analysis is simpler, but discards a proportion of the available data at each `landmark time'. In this work we propose a `retarded kernel' approach to dynamic prediction that sits somewhere in between the two standard methods in terms of complexity. By conditioning hazard rates directly on the covariate measurements over the observation time frame, we define a model that takes into account the full history of covariate measurements but is more practical and parsimonious than joint modelling. Time-dependent association kernels describe the impact of covariate changes at earlier times on the patient's hazard rate at later times. Under the constraints that our model (i) reduces to the standard Cox model for time-independent covariates, and (ii) contains the instantaneous Cox model as a special case, we derive two natural kernel parameterisations. Upon application to three clinical data sets, we find that the predictive accuracy of the retarded kernel approach is comparable to that of the two existing standard methods.
I propose kernel ridge regression estimators for nonparametric dose response curves and semiparametric treatment effects in the setting where an analyst has access to a selected sample rather than a random sample; only for select observations, the outcome is observed. I assume selection is as good as random conditional on treatment and a sufficiently rich set of observed covariates, where the covariates are allowed to cause treatment or be caused by treatment -- an extension of missingness-at-random (MAR). I propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations, allowing treatment and covariates to be discrete or continuous, and low, high, or infinite dimensional. For the continuous treatment case, I prove uniform consistency with finite sample rates. For the discrete treatment case, I prove root-n consistency, Gaussian approximation, and semiparametric efficiency.
We propose a nonparametric bivariate time-varying coefficient model for longitudinal measurements with the occurrence of a terminal event that is subject to right censoring. The time-varying coefficients capture the longitudinal trajectories of covariate effects along with both the followup time and the residual lifetime. The proposed model extends the parametric conditional approach given terminal event time in recent literature, and thus avoids potential model misspecification. We consider a kernel smoothing method for estimating regression coefficients in our model and use cross-validation for bandwidth selection, applying undersmoothing in the final analysis to eliminate the asymptotic bias of the kernel estimator. We show that the kernel estimates are asymptotically normal under mild regularity conditions, and provide an easily computable sandwich variance estimator. We conduct extensive simulations that show desirable performance of the proposed approach, and apply the method to analyzing the medical cost data for patients with end-stage renal disease.
This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under local differential privacy (LDP). Contrary to frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. Besides, when collecting user statistics longitudinally, privacy progressively degrades. Indeed, both "multiple" settings combined (i.e., many attributes and several collections throughout time) imposes several challenges, in which this paper proposes the first solution for frequency estimates under LDP. To tackle these issues, we extend the analysis of three state-of-the-art LDP protocols (Generalized Randomized Response -- GRR, Optimized Unary Encoding -- OUE, and Symmetric Unary Encoding -- SUE) for both longitudinal and multidimensional data collections. While the known literature uses OUE and SUE for two rounds of sanitization (a.k.a. memoization), i.e., L-OUE and L-SUE, respectively, we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility (i.e., L-OSUE). Also, for attributes with small domain sizes, we propose longitudinal GRR (L-GRR), which provides higher utility than the other protocols based on unary encoding. Lastly, we also propose a new solution named \underline{A}daptive \underline{L}DP for \underline{LO}ngitudinal and \underline{M}ultidimensional \underline{FRE}quency \underline{E}stimates (ALLOMFREE), which randomly samples a single attribute to send with the whole privacy budget and adaptively selects the optimal protocol, i.e., either L-GRR or L-OSUE. As shown in the results, ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimations.
Gaussian covariance graph model is a popular model in revealing underlying dependency structures among random variables. A Bayesian approach to the estimation of covariance structures uses priors that force zeros on some off-diagonal entries of covariance matrices and put a positive definite constraint on matrices. In this paper, we consider a spike and slab prior on off-diagonal entries, which uses a mixture of point-mass and normal distribution. The point-mass naturally introduces sparsity to covariance structures so that the resulting posterior from this prior renders covariance structure learning. Under this prior, we calculate posterior model probabilities of covariance structures using Laplace approximation. We show that the error due to Laplace approximation becomes asymptotically marginal at some rate depending on the posterior convergence rate of covariance matrix under the Frobenius norm. With the approximated posterior model probabilities, we propose a new framework for estimating a covariance structure. Since the Laplace approximation is done around the mode of conditional posterior of covariance matrix, which cannot be obtained in the closed form, we propose a block coordinate descent algorithm to find the mode and show that the covariance matrix can be estimated using this algorithm once the structure is chosen. Through a simulation study based on five numerical models, we show that the proposed method outperforms graphical lasso and sample covariance matrix in terms of root mean squared error, max norm, spectral norm, specificity, and sensitivity. Also, the advantage of the proposed method is demonstrated in terms of accuracy compared to our competitors when it is applied to linear discriminant analysis (LDA) classification to breast cancer diagnostic dataset.
In problems with large amounts of missing data one must model two distinct data generating processes: the outcome process which generates the response and the missing data mechanism which determines the data we observe. Under the ignorability condition of Rubin (1976), however, likelihood-based inference for the outcome process does not depend on the missing data mechanism so that only the former needs to be estimated; partially because of this simplification, ignorability is often used as a baseline assumption. We study the implications of Bayesian ignorability in the presence of high-dimensional nuisance parameters and argue that ignorability is typically incompatible with sensible prior beliefs about the amount of selection bias. We show that, for many problems, ignorability directly implies that the prior on the selection bias is tightly concentrated around zero. This is demonstrated on several models of practical interest, and the effect of ignorability on the posterior distribution is characterized for high-dimensional linear models with a ridge regression prior. We then show both how to build high-dimensional models which encode sensible beliefs about the selection bias and also show that under certain narrow circumstances ignorability is less problematic.
We propose a doubly robust approach to characterizing treatment effect heterogeneity in observational studies. We utilize posterior distributions for both the propensity score and outcome regression models to provide valid inference on the conditional average treatment effect even when high-dimensional or nonparametric models are used. We show that our approach leads to conservative inference in finite samples or under model misspecification, and provides a consistent variance estimator when both models are correctly specified. In simulations, we illustrate the utility of these results in difficult settings such as high-dimensional covariate spaces or highly flexible models for the propensity score and outcome regression. Lastly, we analyze environmental exposure data from NHANES to identify how the effects of these exposures vary by subject-level characteristics.
We study causal inference under case-control and case-population sampling. For this purpose, we focus on the binary-outcome and binary-treatment case, where the parameters of interest are causal relative and attributable risk defined via the potential outcome framework. It is shown that strong ignorability is not always as powerful as it is under random sampling and that certain monotonicity assumptions yield comparable results in terms of sharp identified intervals. Specifically, the usual odds ratio is shown to be a sharp identified upper bound on causal relative risk under the monotone treatment response and monotone treatment selection assumptions. We then discuss averaging the conditional (log) odds ratio and propose an algorithm for semiparametrically efficient estimation when averaging is based only on the (conditional) distributions of the covariates that are identified in the data. We also offer algorithms for causal inference if the true population distribution of the covariates is desirable for aggregation. We show the usefulness of our approach by studying two empirical examples from social sciences: the benefit of attending private school for entering a prestigious university in Pakistan and the causal relationship between staying in school and getting involved with drug-trafficking gangs in Brazil.
Measurement error is a pervasive issue which renders the results of an analysis unreliable. The measurement error literature contains numerous correction techniques, which can be broadly divided into those which aim to produce exactly consistent estimators, and those which are only approximately consistent. While consistency is a desirable property, it is typically attained only under specific model assumptions. Two techniques, regression calibration and simulation extrapolation, are used frequently in a wide variety of parametric and semiparametric settings. However, in many settings these methods are only approximately consistent. We generalize these corrections, relaxing assumptions placed on replicate measurements. Under regularity conditions, the estimators are shown to be asymptotically normal, with a sandwich estimator for the asymptotic variance. Through simulation, we demonstrate the improved performance of the modified estimators, over the standard techniques, when these assumptions are violated. We motivate these corrections using the Framingham Heart Study, and apply the generalized techniques to an analysis of these data.
This paper investigates pooling strategies for tail index and extreme quantile estimation from heavy-tailed data. To fully exploit the information contained in several samples, we present general weighted pooled Hill estimators of the tail index and weighted pooled Weissman estimators of extreme quantiles calculated through a nonstandard geometric averaging scheme. We develop their large-sample asymptotic theory across a fixed number of samples, covering the general framework of heterogeneous sample sizes with different and asymptotically dependent distributions. Our results include optimal choices of pooling weights based on asymptotic variance and MSE minimization. In the important application of distributed inference, we prove that the variance-optimal distributed estimators are asymptotically equivalent to the benchmark Hill and Weissman estimators based on the unfeasible combination of subsamples, while the AMSE-optimal distributed estimators enjoy a smaller AMSE than the benchmarks in the case of large bias. We consider additional scenarios where the number of subsamples grows with the total sample size and effective subsample sizes can be low. We extend our methodology to handle serial dependence and the presence of covariates. Simulations confirm that our pooled estimators perform virtually as well as the benchmark estimators. Two applications to real weather and insurance data are showcased.