The popular Bayesian meta-analysis expressed by Bayesian normal-normal hierarchical model (NNHM) synthesizes knowledge from several studies and is highly relevant in practice. Moreover, NNHM is the simplest Bayesian hierarchical model (BHM), which illustrates problems typical in more complex BHMs. Until now, it has been unclear to what extent the data determines the marginal posterior distributions of the parameters in NNHM. To address this issue we computed the second derivative of the Bhattacharyya coefficient with respect to the weighted likelihood, defined the total empirical determinacy (TED), the proportion of the empirical determinacy of location to TED (pEDL), and the proportion of the empirical determinacy of spread to TED (pEDS). We implemented this method in the R package \texttt{ed4bhm} and considered two case studies and one simulation study. We quantified TED, pEDL and pEDS under different modeling conditions such as model parametrization, the primary outcome, and the prior. This clarified to what extent the location and spread of the marginal posterior distributions of the parameters are determined by the data. Although these investigations focused on Bayesian NNHM, the method proposed is applicable more generally to complex BHMs.
In this study, we develop a novel estimation method for quantile treatment effects (QTE) under rank invariance and rank stationarity assumptions. Ishihara (2020) explores identification of the nonseparable panel data model under these assumptions and proposes a parametric estimation based on the minimum distance method. However, when the dimensionality of the covariates is large, the minimum distance estimation using this process is computationally demanding. To overcome this problem, we propose a two-step estimation method based on the quantile regression and minimum distance methods. We then show the uniform asymptotic properties of our estimator and the validity of the nonparametric bootstrap. The Monte Carlo studies indicate that our estimator performs well in finite samples. Finally, we present two empirical illustrations, to estimate the distributional effects of insurance provision on household production and TV watching on child cognitive development.
We present a unified technique for sequential estimation of convex divergences between distributions, including integral probability metrics like the kernel maximum mean discrepancy, $\varphi$-divergences like the Kullback-Leibler divergence, and optimal transport costs, such as powers of Wasserstein distances. This is achieved by observing that empirical convex divergences are (partially ordered) reverse submartingales with respect to the exchangeable filtration, coupled with maximal inequalities for such processes. These techniques appear to be complementary and powerful additions to the existing literature on both confidence sequences and convex divergences. We construct an offline-to-sequential device that converts a wide array of existing offline concentration inequalities into time-uniform confidence sequences that can be continuously monitored, providing valid tests or confidence intervals at arbitrary stopping times. The resulting sequential bounds pay only an iterated logarithmic price over the corresponding fixed-time bounds, retaining the same dependence on problem parameters (like dimension or alphabet size if applicable). These results are also applicable to more general convex functionals, like the negative differential entropy, suprema of empirical processes, and V-Statistics.
We consider a parametric modelling approach for survival data where covariates are allowed to enter the model through multiple distributional parameters, i.e., scale and shape. This is in contrast with the standard convention of having a single covariate-dependent parameter, typically the scale. Taking what is referred to as a multi-parameter regression (MPR) approach to modelling has been shown to produce flexible and robust models with relatively low model complexity cost. However, it is very common to have clustered data arising from survival analysis studies, and this is something that is under developed in the MPR context. The purpose of this article is to extend MPR models to handle multivariate survival data by introducing random effects in both the scale and the shape regression components. We consider a variety of possible dependence structures for these random effects (independent, shared, and correlated), and estimation proceeds using a h-likelihood approach. The performance of our estimation procedure is investigated by a way of an extensive simulation study, and the merits of our modelling approach are illustrated through applications to two real data examples, a lung cancer dataset and a bladder cancer dataset.
Intercurrent (post-treatment) events occur frequently in randomized trials, and investigators often express interest in treatment effects that suitably take account of these events. A naive conditioning on intercurrent events does not have a straight-forward causal interpretation, and the practical relevance of other commonly used approaches is debated. In this work, we discuss how to formulate and choose an estimand, beyond the marginal intention to treat effect, from the point of view of a decision maker and drug developer. In particular, we argue that careful articulation of a practically useful research question should either reflect decision making at this point in time or future drug development. Indeed, a substantially interesting estimand is a formalization of the (plain English) description of a research question. A common feature of estimands that are practically useful is that they correspond to possibly hypothetical but well-defined interventions in identifiable (sub)populations. To illustrate our points, we consider five examples that were recently used to motivate consideration of principal stratum estimands in clinical trials. In all of these examples, we propose alternative causal estimands, such as conditional effects, sequential regime effects and separable effects, that correspond to explicit research questions of substantial interest. Certain questions require stronger assumptions for identification. However, we highlight that our proposed estimands require less stringent assumptions than estimands commonly targeted in these settings, including principal stratum effects.
This paper develops a sparsity-inducing version of Bayesian Causal Forests, a recently proposed nonparametric causal regression model that employs Bayesian Additive Regression Trees and is specifically designed to estimate heterogeneous treatment effects using observational data. The sparsity-inducing component we introduce is motivated by empirical studies where not all the available covariates are relevant, leading to different degrees of sparsity underlying the surfaces of interest in the estimation of individual treatment effects. The extended version presented in this work, which we name Shrinkage Bayesian Causal Forest, is equipped with an additional pair of priors allowing the model to adjust the weight of each covariate through the corresponding number of splits in the tree ensemble. These priors improve the model's adaptability to sparse data generating processes and allow to perform fully Bayesian feature shrinkage in a framework for treatment effects estimation, and thus to uncover the moderating factors driving heterogeneity. In addition, the method allows prior knowledge about the relevant confounding covariates and the relative magnitude of their impact on the outcome to be incorporated in the model. We illustrate the performance of our method in simulated studies, in comparison to Bayesian Causal Forest and other state-of-the-art models, to demonstrate how it scales up with an increasing number of covariates and how it handles strongly confounded scenarios. Finally, we also provide an example of application using real-world data.
Large health care data repositories such as electronic health records (EHR) opens new opportunities to derive individualized treatment strategies to improve disease outcomes. We study the problem of estimating sequential treatment rules tailored to patient's individual characteristics, often referred to as dynamic treatment regimes (DTRs). We seek to find the optimal DTR which maximizes the discontinuous value function through direct maximization of a fisher consistent surrogate loss function. We show that a large class of concave surrogates fails to be Fisher consistent, which differs from the classic setting for binary classification. We further characterize a non-concave family of Fisher consistent smooth surrogate functions, which can be optimized with gradient descent using off-the-shelf machine learning algorithms. Compared to the existing direct search approach under the support vector machine framework (Zhao et al., 2015), our proposed DTR estimation via surrogate loss optimization (DTRESLO) method is more computationally scalable to large sample size and allows for a broader functional class for the predictor effects. We establish theoretical properties for our proposed DTR estimator and obtain a sharp upper bound on the regret corresponding to our DTRESLO method. Finite sample performance of our proposed estimator is evaluated through extensive simulations and an application on deriving an optimal DTR for treatment sepsis using EHR data from patients admitted to intensive care units.
In the present paper we initiate the challenging task of building a mathematically sound theory for Adaptive Virtual Element Methods (AVEMs). Among the realm of polygonal meshes, we restrict our analysis to triangular meshes with hanging nodes in 2d -- the simplest meshes with a systematic refinement procedure that preserves shape regularity and optimal complexity. A major challenge in the a posteriori error analysis of AVEMs is the presence of the stabilization term, which is of the same order as the residual-type error estimator but prevents the equivalence of the latter with the energy error. Under the assumption that any chain of recursively created hanging nodes has uniformly bounded length, we show that the stabilization term can be made arbitrarily small relative to the error estimator provided the stabilization parameter of the scheme is sufficiently large. This quantitative estimate leads to stabilization-free upper and lower a posteriori bounds for the energy error. This novel and crucial property of VEMs hinges on the largest subspace of continuous piecewise linear functions and the delicate interplay between its coarser scales and the finer ones of the VEM space. Our results apply to $H^1$-conforming (lowest order) VEMs of any kind, including the classical and enhanced VEMs.
We present a novel method for reducing the computational complexity of rigorously estimating the partition functions (normalizing constants) of Gibbs (Boltzmann) distributions, which arise ubiquitously in probabilistic graphical models. A major obstacle to practical applications of Gibbs distributions is the need to estimate their partition functions. The state of the art in addressing this problem is multi-stage algorithms, which consist of a cooling schedule, and a mean estimator in each step of the schedule. While the cooling schedule in these algorithms is adaptive, the mean estimation computations use MCMC as a black-box to draw approximate samples. We develop a doubly adaptive approach, combining the adaptive cooling schedule with an adaptive MCMC mean estimator, whose number of Markov chain steps adapts dynamically to the underlying chain. Through rigorous theoretical analysis, we prove that our method outperforms the state of the art algorithms in several factors: (1) The computational complexity of our method is smaller; (2) Our method is less sensitive to loose bounds on mixing times, an inherent component in these algorithms; and (3) The improvement obtained by our method is particularly significant in the most challenging regime of high-precision estimation. We demonstrate the advantage of our method in experiments run on classic factor graphs, such as voting models and Ising models.
A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient summary statistics, and is both easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with the same asymptotic guarantees. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform approximate hypothesis tests in the presence of intractable likelihood functions.
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.