In this article, we propose the outcome-adjusted balance measure to perform model selection for the generalized propensity score (GPS), which serves as an essential component in estimation of the pairwise average treatment effects (ATEs) in observational studies with more than two treatment levels. The primary goal of the balance measure is to identify the GPS model specification such that the resulting ATE estimator is consistent and efficient. Following recent empirical and theoretical evidence, we establish that the optimal GPS model should only include covariates related to the outcomes. Given a collection of candidate GPS models, the outcome-adjusted balance measure imputes all baseline covariates by matching on each candidate model, and selects the model that minimizes a weighted sum of absolute mean differences between the imputed and original values of the covariates. The weights are defined to leverage the covariate-outcome relationship, so that GPS models without optimal variable selection are penalized. Under appropriate assumptions, we show that the outcome-adjusted balance measure consistently selects the optimal GPS model, so that the resulting GPS matching estimator is asymptotically normal and efficient. We compare its finite sample performance with existing measures in a simulation study. We illustrate an application of the proposed methodology in the analysis of the Tutoring data.
Sibling fixed effects (FE) models are useful for estimating causal treatment effects while offsetting unobserved sibling-invariant confounding. However, treatment estimates are biased if an individual's outcome affects their sibling's outcome. We propose a robustness test for assessing the presence of outcome-to-outcome interference in linear two-sibling FE models. We regress a gain-score--the difference between siblings' continuous outcomes--on both siblings' treatments and on a pre-treatment observed FE. Under certain restrictions, the observed FE's partial regression coefficient signals the presence of outcome-to-outcome interference. Monte Carlo simulations demonstrated the robustness test under several models. We found that an observed FE signaled outcome-to-outcome spillover if it was directly associated with an sibling-invariant confounder of treatments and outcomes, directly associated with a sibling's treatment, or directly and equally associated with both siblings' outcomes. However, the robustness test collapsed if the observed FE was directly but differentially associated with siblings' outcomes or if outcomes affected siblings' treatments.
Many frameworks exist to infer cause and effect relations in complex nonlinear systems but a complete theory is lacking. A new framework is presented that is fully nonlinear, provides a complete information theoretic disentanglement of causal processes, allows for nonlinear interactions between causes, identifies the causal strength of missing or unknown processes, and can analyze systems that cannot be represented on Directed Acyclic Graphs. The basic building blocks are information theoretic measures such as (conditional) mutual information and a new concept called certainty that monotonically increases with the information available about the target process. The framework is presented in detail and compared with other existing frameworks, and the treatment of confounders is discussed. While there are systems with structures that the framework cannot disentangle, it is argued that any causal framework that is based on integrated quantities will miss out potentially important information of the underlying probability density functions. The framework is tested on several highly simplified stochastic processes to demonstrate how blocking and gateways are handled, and on the chaotic Lorentz 1963 system. We show that the framework provides information on the local dynamics, but also reveals information on the larger scale structure of the underlying attractor. Furthermore, by applying it to real observations related to the El-Nino-Southern-Oscillation system we demonstrate its power and advantage over other methodologies.
To draw real-world evidence about the comparative effectiveness of complex time-varying treatment regimens on patient survival, we develop a joint marginal structural proportional hazards model and novel weighting schemes in continuous time to account for time-varying confounding and censoring. Our methods formulate complex longitudinal treatments with multiple "start/stop" switches as the recurrent events with discontinuous intervals of treatment eligibility. We derive the weights in continuous time to handle a complex longitudinal dataset on its own terms, without the need to discretize or artificially align the measurement times. We further propose using machine learning models designed for censored survival data with time-varying covariates and the kernel function estimator of the baseline intensity to efficiently estimate the continuous-time weights. Our simulations demonstrate that the proposed methods provide better bias reduction and nominal coverage probability when analyzing observational longitudinal survival data with irregularly spaced time intervals, compared to conventional methods that require aligned measurement time points. We apply the proposed methods to a large-scale COVID-19 dataset to estimate the causal effects of several COVID-19 treatment strategies on in-hospital mortality or ICU admission, and provide new insights relative to findings from randomized trials.
Different from a typical independent identically distributed (IID) element assumption, this paper studies the estimation of IID row random matrix for the generalized linear model constructed by a linear mixing space and a row-wise mapping channel. The objective inference problem arises in many engineering fields, such as wireless communications, compressed sensing, and phase retrieval. We apply the replica method from statistical mechanics to analyze the exact minimum mean square error (MMSE) under the Bayes-optimal setting, in which the explicit replica symmetric solution of the exact MMSE estimator is obtained. Meanwhile, the input-output mutual information relation between the objective model and the equivalent single-vector system is established. To estimate the signal, we also propose a computationally efficient message passing based algorithm on expectation propagation (EP) perspective and analyze its dynamics. We verify that the asymptotic MSE of proposed algorithm predicted by its state evolution (SE) matches perfectly the exact MMSE estimator predicted by the replica method. That indicates, the optimal MSE error can be attained by the proposed algorithm if it has a unique fixed point.
A recent UK Biobank study clustered 156 parameterised models associating risk factors with common diseases, to identify shared causes of disease. Parametric models are often more familiar and interpretable than clustered data, can build-in prior knowledge, adjust for known confounders, and use marginalisation to emphasise parameters of interest. Estimates include a Maximum Likelihood Estimate (MLE) that is (approximately) normally distributed, and its covariance. Clustering models rarely consider the covariances of data points, that are usually unavailable. Here a clustering model is formulated that accounts for covariances of the data, and assumes that all MLEs in a cluster are the same. The log-likelihood is exactly calculated in terms of the fitted parameters, with the unknown cluster means removed by marginalisation. The procedure is equivalent to calculating the Bayesian Information Criterion (BIC) without approximation, and can be used to assess the optimum number of clusters for a given clustering algorithm. The log-likelihood has terms to penalise poor fits and model complexity, and can be maximised to determine the number and composition of clusters. Results can be similar to using the ad-hoc "elbow criterion", but are less subjective. The model is also formulated as a Dirichlet process mixture model (DPMM). The overall approach is equivalent to a multi-layer algorithm that characterises features through the normally distributed MLEs of a fitted model, and then clusters the normal distributions. Examples include simulated data, and clustering of diseases in UK Biobank data using estimated associations with risk factors. The results can be applied directly to measured data and their estimated covariances, to the output from clustering models, or the DPMM implementation can be used to cluster fitted models directly.
Labelling data is a major practical bottleneck in training and testing classifiers. Given a collection of unlabelled data points, we address how to select which subset to label to best estimate test metrics such as accuracy, $F_1$ score or micro/macro $F_1$. We consider two sampling based approaches, namely the well-known Importance Sampling and we introduce a novel application of Poisson Sampling. For both approaches we derive the minimal error sampling distributions and how to approximate and use them to form estimators and confidence intervals. We show that Poisson Sampling outperforms Importance Sampling both theoretically and experimentally.
Generalized Method of Moments (GMM) estimators in their various forms, including the popular Maximum Likelihood (ML) estimator, are frequently applied for the evaluation of complex econometric models with not analytically computable moment or likelihood functions. As the objective functions of GMM- and ML-estimators themselves constitute the approximation of an integral, more precisely of the expected value over the real world data space, the question arises whether the approximation of the moment function and the simulation of the entire objective function can be combined. Motivated by the popular Probit and Mixed Logit models, we consider double integrals with a linking function which stems from the considered estimator, e.g. the logarithm for Maximum Likelihood, and apply a sparse tensor product quadrature to reduce the computational effort for the approximation of the combined integral. Given H\"older continuity of the linking function, we prove that this approach can improve the order of the convergence rate of the classical GMM- and ML-estimator by a factor of two, even for integrands of low regularity or high dimensionality. This result is illustrated by numerical simulations of Mixed Logit and Multinomial Probit integrals which are estimated by ML- and GMM-estimators, respectively.
The popular Bayesian meta-analysis expressed by Bayesian normal-normal hierarchical model (NNHM) synthesizes knowledge from several studies and is highly relevant in practice. Moreover, NNHM is the simplest Bayesian hierarchical model (BHM), which illustrates problems typical in more complex BHMs. Until now, it has been unclear to what extent the data determines the marginal posterior distributions of the parameters in NNHM. To address this issue we computed the second derivative of the Bhattacharyya coefficient with respect to the weighted likelihood, defined the total empirical determinacy (TED), the proportion of the empirical determinacy of location to TED (pEDL), and the proportion of the empirical determinacy of spread to TED (pEDS). We implemented this method in the R package \texttt{ed4bhm} and considered two case studies and one simulation study. We quantified TED, pEDL and pEDS under different modeling conditions such as model parametrization, the primary outcome, and the prior. This clarified to what extent the location and spread of the marginal posterior distributions of the parameters are determined by the data. Although these investigations focused on Bayesian NNHM, the method proposed is applicable more generally to complex BHMs.
We consider the problem of inference for nonlinear, multivariate diffusion processes, satisfying It\^o stochastic differential equations (SDEs), using data at discrete times that may be incomplete and subject to measurement error. Our starting point is a state-of-the-art correlated pseudo-marginal Metropolis-Hastings algorithm, that uses correlated particle filters to induce strong and positive correlation between successive likelihood estimates. However, unless the measurement error or the dimension of the SDE is small, correlation can be eroded by the resampling steps in the particle filter. We therefore propose a novel augmentation scheme, that allows for conditioning on values of the latent process at the observation times, completely avoiding the need for resampling steps. We integrate over the uncertainty at the observation times with an additional Gibbs step. Connections between the resulting pseudo-marginal scheme and existing inference schemes for diffusion processes are made, giving a unified inference framework that encompasses Gibbs sampling and pseudo marginal schemes. The methodology is applied in three examples of increasing complexity. We find that our approach offers substantial increases in overall efficiency, compared to competing methods.
Determination of posterior probability for go-no-go decision and predictive power are becoming increasingly common for resource optimization in clinical investigation. There are vast published literature on these topics; however, the terminologies are not consistently used across the literature. Further, there is a lack of consolidated presentation of various concepts of the probability of success. We attempted to fill this gap. This paper first provides a detailed derivation of these probability of success measures under the frequentist and Bayesian paradigms in a general setting. Subsequently, we have presented the analytical formula for these probability of success measures for continuous, binary, and time-to-event endpoints separately. This paper can be used as a single point reference to determine the following measures: (a) the conditional power (CP) based on interim results, (b) the predictive power of success (PPoS) based on interim results with or without prior distribution, and (d) the probability of success (PoS) for a prospective trial at the design stage. We have discussed both clinical success and trial success. This paper's discussion is mostly based on the normal approximation for prior distribution and the estimate of the parameter of interest. Besides, predictive power using the beta prior for the binomial case is also presented. Some examples are given for illustration. R functions to calculate CP and PPoS are available through the LongCART package. An R shiny app is also available at //ppos.herokuapp.com/.