Weighting estimators based on propensity scores are widely used for causal estimation in a variety of contexts, such as observational studies, marginal structural models and interference. They enjoy appealing theoretical properties such as consistency and possible efficiency under correct model specification. However, this theoretical appeal may be diminished in practice by sensitivity to misspecification of the propensity score model. To improve on this, we borrow an idea from an alternative approach to causal effect estimation in observational studies, namely subclassification estimators. It is well known that compared to weighting estimators, subclassification methods are usually more robust to model misspecification. In this paper, we first discuss an intrinsic connection between the seemingly unrelated weighting and subclassification estimators, and then use this connection to construct robust propensity score weights via subclassification. We illustrate this idea by proposing so-called full-classification weights and accompanying estimators for causal effect estimation in observational studies. Our novel estimators are both consistent and robust to model misspecification, thereby combining the strengths of traditional weighting and subclassification estimators for causal effect estimation from observational studies. Numerical studies show that the proposed estimators perform favorably compared to existing methods.
In recent years, large-scale Bayesian learning draws a great deal of attention. However, in big-data era, the amount of data we face is growing much faster than our ability to deal with it. Fortunately, it is observed that large-scale datasets usually own rich internal structure and is somewhat redundant. In this paper, we attempt to simplify the Bayesian posterior via exploiting this structure. Specifically, we restrict our interest to the so-called well-clustered datasets and construct an \emph{approximate posterior} according to the clustering information. Fortunately, the clustering structure can be efficiently obtained via a particular clustering algorithm. When constructing the approximate posterior, the data points in the same cluster are all replaced by the centroid of the cluster. As a result, the posterior can be significantly simplified. Theoretically, we show that under certain conditions the approximate posterior we construct is close (measured by KL divergence) to the exact posterior. Furthermore, thorough experiments are conducted to validate the fact that the constructed posterior is a good approximation to the true posterior and much easier to sample from.
Censored data, where the event time is partially observed, are challenging for survival probability estimation. In this paper, we introduce a novel nonparametric fiducial approach to interval-censored data, including right-censored, current status, case II censored, and mixed case censored data. The proposed approach leveraging a simple Gibbs sampler has a useful property of being "one size fits all", i.e., the proposed approach automatically adapts to all types of non-informative censoring mechanisms. As shown in the extensive simulations, the proposed fiducial confidence intervals significantly outperform existing methods in terms of both coverage and length. In addition, the proposed fiducial point estimator has much smaller estimation errors than the nonparametric maximum likelihood estimator. Furthermore, we apply the proposed method to Austrian rubella data and a study of hemophiliacs infected with the human immunodeficiency virus. The strength of the proposed fiducial approach is not only estimation and uncertainty quantification but also its automatic adaptation to a variety of censoring mechanisms.
Survival outcomes are common in comparative effectiveness studies and require unique handling because they are usually incompletely observed due to right-censoring. A "once for all" approach for causal inference with survival outcomes constructs pseudo-observations and allows standard methods such as propensity score weighting to proceed as if the outcomes are completely observed. For a general class of model-free causal estimands with survival outcomes on user-specified target populations, we develop corresponding propensity score weighting estimators based on the pseudo-observations and establish their asymptotic properties. In particular, utilizing the functional delta-method and the von Mises expansion, we derive a new closed-form variance of the weighting estimator that takes into account the uncertainty due to both pseudo-observation calculation and propensity score estimation. This allows valid and computationally efficient inference without resampling. We also prove the optimal efficiency property of the overlap weights within the class of balancing weights for survival outcomes. The proposed methods are applicable to both binary and multiple treatments. Extensive simulations are conducted to explore the operating characteristics of the proposed method versus other commonly used alternatives. We apply the proposed method to compare the causal effects of three popular treatment approaches for prostate cancer patients.
Semiconductor device models are essential to understand the charge transport in thin film transistors (TFTs). Using these TFT models to draw inference involves estimating parameters used to fit to the experimental data. These experimental data can involve extracted charge carrier mobility or measured current. Estimating these parameters help us draw inferences about device performance. Fitting a TFT model for a given experimental data using the model parameters relies on manual fine tuning of multiple parameters by human experts. Several of these parameters may have confounding effects on the experimental data, making their individual effect extraction a non-intuitive process during manual tuning. To avoid this convoluted process, we propose a new method for automating the model parameter extraction process resulting in an accurate model fitting. In this work, model choice based approximate Bayesian computation (aBc) is used for generating the posterior distribution of the estimated parameters using observed mobility at various gate voltage values. Furthermore, it is shown that the extracted parameters can be accurately predicted from the mobility curves using gradient boosted trees. This work also provides a comparative analysis of the proposed framework with fine-tuned neural networks wherein the proposed framework is shown to perform better.
We study the problem of density estimation for a random vector ${\boldsymbol X}$ in $\mathbb R^d$ with probability density $f(\boldsymbol x)$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. The optimal spanning tree $T^*$ is the spanning tree $T$, for which the Kullback-Leibler divergence of $f$ and $f_{T}$ is the smallest. From i.i.d. data we identify the optimal tree $T^*$ and computationally efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has that $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz continuous $f$ with bounded support, $\mathbb E\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\}=O(n^{-1/4})$.
Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. However, accurately estimating the post-click conversion rate (CVR) is challenging due to the selection bias, i.e., the observed clicked events usually happen on users' preferred items. Currently, most existing methods utilize counterfactual learning to debias recommender systems. Among them, the doubly robust (DR) estimator has achieved competitive performance by combining the error imputation based (EIB) estimator and the inverse propensity score (IPS) estimator in a doubly robust way. However, inaccurate error imputation may result in its higher variance than the IPS estimator. Worse still, existing methods typically use simple model-agnostic methods to estimate the imputation error, which are not sufficient to approximate the dynamically changing model-correlated target (i.e., the gradient direction of the prediction model). To solve these problems, we first derive the bias and variance of the DR estimator. Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness. Moreover, we propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation. Besides, we empirically verify that the proposed learning scheme can further eliminate the high variance problem of the imputation learning. To evaluate its effectiveness, extensive experiments are conducted on a semi-synthetic dataset and two real-world datasets. The results demonstrate the superiority of the proposed approach over the state-of-the-art methods. The code is available at //github.com/guosyjlu/MRDR-DL.
This paper studies distributed binary test of statistical independence under communication (information bits) constraints. While testing independence is very relevant in various applications, distributed independence test is particularly useful for event detection in sensor networks where data correlation often occurs among observations of devices in the presence of a signal of interest. By focusing on the case of two devices because of their tractability, we begin by investigating conditions on Type I error probability restrictions under which the minimum Type II error admits an exponential behavior with the sample size. Then, we study the finite sample-size regime of this problem. We derive new upper and lower bounds for the gap between the minimum Type II error and its exponential approximation under different setups, including restrictions imposed on the vanishing Type I error probability. Our theoretical results shed light on the sample-size regimes at which approximations of the Type II error probability via error exponents became informative enough in the sense of predicting well the actual error probability. We finally discuss an application of our results where the gap is evaluated numerically, and we show that exponential approximations are not only tractable but also a valuable proxy for the Type II probability of error in the finite-length regime.
Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
This paper addresses the problem of viewpoint estimation of an object in a given image. It presents five key insights that should be taken into consideration when designing a CNN that solves the problem. Based on these insights, the paper proposes a network in which (i) The architecture jointly solves detection, classification, and viewpoint estimation. (ii) New types of data are added and trained on. (iii) A novel loss function, which takes into account both the geometry of the problem and the new types of data, is propose. Our network improves the state-of-the-art results for this problem by 9.8%.