When studying treatment effects in multilevel studies, investigators commonly use (semi-)parametric estimators, which make strong parametric assumptions about the outcome, the treatment, and/or the correlation between individuals. We propose two nonparametric, doubly robust, asymptotically Normal estimators of treatment effects that do not make such assumptions. The first estimator is an extension of the cross-fitting estimator applied to clustered settings. The second estimator is a new estimator that uses conditional propensity scores and an outcome covariance model to improve efficiency. We apply our estimators in simulation and empirical studies and find that they consistently obtain the smallest standard errors.
This paper provides some extended results on estimating the parameter matrix of high-dimensional regression model when the covariate or response possess weaker moment condition. We investigate the $M$-estimator of Fan et al. (Ann Stat 49(3):1239--1266, 2021) for matrix completion model with $(1+\epsilon)$-th moments. The corresponding phase transition phenomenon is observed. When $\epsilon \geq 1$, the robust estimator possesses the same convergence rate as previous literature. While $1> \epsilon>0$, the rate will be slower. For high dimensional multiple index coefficient model, we also apply the element-wise truncation method to construct a robust estimator which handle missing and heavy-tailed data with finite fourth moment.
Background: It has long been advised to account for baseline covariates in the analysis of confirmatory randomised trials, with the main statistical justifications being that this increases power and, when a randomisation scheme balanced covariates, permits a valid estimate of experimental error. There are various methods available to account for covariates but it is not clear how to choose among them. Methods: Taking the perspective of writing a statistical analysis plan, we consider how to choose between the three most promising broad approaches: direct adjustment, standardisation and inverse-probability-of-treatment weighting. Results: The three approaches are similar in being asymptotically efficient, in losing efficiency with mis-specified covariate functions, and in handling designed balance. If a marginal estimand is targeted (for example, a risk difference or survival difference), then direct adjustment should be avoided because it involves fitting non-standard models that are subject to convergence issues. Convergence is most likely with IPTW. Robust standard errors used by IPTW are anti-conservative at small sample sizes. All approaches can use similar methods to handle missing covariate data. With missing outcome data, each method has its own way to estimate a treatment effect in the all-randomised population. We illustrate some issues in a reanalysis of GetTested, a randomised trial designed to assess the effectiveness of an electonic sexually-transmitted-infection testing and results service. Conclusions: No single approach is always best: the choice will depend on the trial context. We encourage trialists to consider all three methods more routinely.
In causality, estimating the effect of a treatment without confounding inference remains a major issue because requires to assess the outcome in both case with and without treatment. Not being able to observe simultaneously both of them, the estimation of potential outcome remains a challenging task. We propose an innovative approach where the problem is reformulated as a missing data model. The aim is to estimate the hidden distribution of \emph{causal populations}, defined as a function of treatment and outcome. A Causal Auto-Encoder (CAE), enhanced by a prior dependent on treatment and outcome information, assimilates the latent space to the probability distribution of the target populations. The features are reconstructed after being reduced to a latent space and constrained by a mask introduced in the intermediate layer of the network, containing treatment and outcome information.
Information from various data sources is increasingly available nowadays. However, some of the data sources may produce biased estimation due to commonly encountered biased sampling, population heterogeneity, or model misspecification. This calls for statistical methods to combine information in the presence of biased sources. In this paper, a robust data fusion-extraction method is proposed. The method can produce a consistent estimator of the parameter of interest even if many of the data sources are biased. The proposed estimator is easy to compute and only employs summary statistics, and hence can be applied to many different fields, e.g. meta-analysis, Mendelian randomisation and distributed system. Moreover, the proposed estimator is asymptotically equivalent to the oracle estimator that only uses data from unbiased sources under some mild conditions. Asymptotic normality of the proposed estimator is also established. In contrast to the existing meta-analysis methods, the theoretical properties are guaranteed even if both the number of data sources and the dimension of the parameter diverge as the sample size increases, which ensures the performance of the proposed method over a wide range. The robustness and oracle property is also evaluated via simulation studies. The proposed method is applied to a meta-analysis data set to evaluate the surgical treatment for the moderate periodontal disease, and a Mendelian randomization data set to study the risk factors of head and neck cancer.
We study identification and estimation of treatment effects in common school choice settings, under unrestricted heterogeneity in individual potential outcomes. We propose two notions of identification, corresponding to design- and sampling-based uncertainty, respectively. We characterize the set of causal estimands that are identified for a large variety of school choice mechanisms, including ones that feature both random and non-random tie-breaking; we discuss their policy implications. We also study the asymptotic behavior of nonparametric estimators for these causal estimands. Lastly, we connect our approach to the propensity score approach proposed in Abdulkadiroglu, Angrist, Narita, and Pathak (2017a, forthcoming), and derive the implicit estimands of the latter approach, under fully heterogeneous treatment effects.
The survey world is rife with nonresponse and in many situations the missingness mechanism is not at random, which is a major source of bias for statistical inference. Nonetheless, the survey world is rich with paradata that track the data collection process. A traditional form of paradata is callback data that record attempts to contact. Although it has been recognized that callback data are useful for nonresponse adjustment, they have not been used widely in statistical analysis until recently. In particular, there have been a few attempts that use callback data to estimate response propensity scores, which rest on fully parametric models and fairly stringent assumptions. In this paper, we propose a stableness of resistance assumption for identifying the propensity scores and the outcome distribution of interest, without imposing any parametric restrictions. We establish the semiparametric efficiency theory, derive the efficient influence function, and propose a suite of semiparametric estimation methods including doubly robust ones, which generalize existing parametric approaches. We also consider extension of this framework to causal inference for unmeasured confounding adjustment. Application to a Consumer Expenditure Survey dataset suggests an association between nonresponse and high housing expenditures, and reanalysis of Card (1995)'s dataset on the return to schooling shows a smaller effect of education in the overall population than in the respondents.
In the study of causal inference, statisticians show growing interest in estimating and analyzing heterogeneity in causal effects in observational studies. However, there usually exists a trade-off between accuracy and interpretability when developing a desirable estimator for treatment effects. To make efforts to address the issue, we propose a non-parametric framework for estimating the Conditional Average Treatment Effect (CATE) function in this paper. The framework integrates two components: (i) leverage the joint use of propensity and prognostic scores in a matching algorithm to obtain a proxy of the heterogeneous treatment effects for each observation, (ii) utilize non-parametric regression trees to construct an estimator for the CATE function conditioning on the two scores. The method naturally stratifies treatment effects into subgroups over a 2d grid whose axis are the propensity and prognostic scores. We conduct benchmark experiments on multiple simulated data and demonstrate clear advantages of the proposed estimator over state of the art methods. We also evaluate empirical performance in real-life settings, using two observational data from a clinical trial and a complex social survey, and interpret policy implications following the numerical results
Gaussian covariance graph model is a popular model in revealing underlying dependency structures among random variables. A Bayesian approach to the estimation of covariance structures uses priors that force zeros on some off-diagonal entries of covariance matrices and put a positive definite constraint on matrices. In this paper, we consider a spike and slab prior on off-diagonal entries, which uses a mixture of point-mass and normal distribution. The point-mass naturally introduces sparsity to covariance structures so that the resulting posterior from this prior renders covariance structure learning. Under this prior, we calculate posterior model probabilities of covariance structures using Laplace approximation. We show that the error due to Laplace approximation becomes asymptotically marginal at some rate depending on the posterior convergence rate of covariance matrix under the Frobenius norm. With the approximated posterior model probabilities, we propose a new framework for estimating a covariance structure. Since the Laplace approximation is done around the mode of conditional posterior of covariance matrix, which cannot be obtained in the closed form, we propose a block coordinate descent algorithm to find the mode and show that the covariance matrix can be estimated using this algorithm once the structure is chosen. Through a simulation study based on five numerical models, we show that the proposed method outperforms graphical lasso and sample covariance matrix in terms of root mean squared error, max norm, spectral norm, specificity, and sensitivity. Also, the advantage of the proposed method is demonstrated in terms of accuracy compared to our competitors when it is applied to linear discriminant analysis (LDA) classification to breast cancer diagnostic dataset.
Randomized control trials (RCTs) have been the gold standard to evaluate the effectiveness of a program, policy, or treatment on an outcome of interest. However, many RCTs assume that study participants are willing to share their (potentially sensitive) data, specifically their response to treatment. This assumption, while trivial at first, is becoming difficult to satisfy in the modern era, especially in online settings where there are more regulations to protect individuals' data. The paper presents a new, simple experimental design that is differentially private, one of the strongest notions of data privacy. Also, using works on noncompliance in experimental psychology, we show that our design is robust against "adversarial" participants who may distrust investigators with their personal data and provide contaminated responses to intentionally bias the results of the experiment. Under our new design, we propose unbiased and asymptotically Normal estimators for the average treatment effect. We also present a doubly robust, covariate-adjusted estimator that uses pre-treatment covariates (if available) to improve efficiency. We conclude by using the proposed experimental design to evaluate the effectiveness of online statistics courses at the University of Wisconsin-Madison during the Spring 2021 semester, where many classes were online due to COVID-19.
Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data being collected, it is difficult to know which prognostic factors might be relevant in the treatment rule. Therefore, a more data-driven approach of selecting these covariates might improve the estimated decision rules and simplify models to make them easier to interpret. We propose a variable selection method for DTR estimation using penalized dynamic weighted least squares. Our method has the strong heredity property, that is, an interaction term can be included in the model only if the corresponding main terms have also been selected. Through simulations, we show our method has both the double robustness property and the oracle property, and the newly proposed methods compare favorably with other variable selection approaches.