In many applications, to estimate a parameter or quantity of interest psi, a finite-dimensional nuisance parameter theta is estimated first. For example, many estimators in causal inference depend on the propensity score: the probability of (possibly time-dependent) treatment given the past. theta is often estimated in a first step, which can affect the variance of the estimator for psi. theta is often estimated by maximum (partial) likelihood. Inverse Probability Weighting, Marginal Structural Models and Structural Nested Models are well-known causal inference examples, where one often posits a (pooled) logistic regression model for the treatment (initiation) and/or censoring probabilities, and estimates these with standard software, so by maximum partial likelihood. Inverse Probability Weighting, Marginal Structural Models and Structural Nested Models have something else in common: they can all be shown to be based on unbiased estimating equations. This paper has four main results for estimators psi-hat based on unbiased estimating equations including theta. First, it shows that the true limiting variance of psi-hat is smaller or remains the same when theta is estimated by solving (partial) score equations, compared to if theta were known and plugged in. Second, it shows that if estimating theta using (partial) score equations is ignored, the resulting sandwich estimator for the variance of psi-hat is conservative. Third, it provides a variance correction. Fourth, it shows that if the estimator psi-hat with the true theta plugged in is efficient, the true limiting variance of psi-hat does not depend on whether or not theta is estimated, and the sandwich estimator for the variance of psi-hat ignoring estimation of theta is consistent. These findings hold in semiparametric and parametric settings where the parameters of interest psi are estimated based on unbiased estimating equations.
This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization (and improvement) of the celebrated Chernoff method. At its heart, it is based on deriving a new class of composite nonnegative martingales, with strong connections to testing by betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform existing approaches based on Hoeffding or empirical Bernstein inequalities and their recent supermartingale generalizations. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, when sampling with and without replacement.
We study approximation methods for a large class of mixed models with a probit link function that includes mixed versions of the binomial model, the multinomial model, and generalized survival models. The class of models is special because the marginal likelihood can be expressed as Gaussian weighted integrals or as multivariate Gaussian cumulative density functions. The latter approach is unique to the probit link function models and has been proposed for parameter estimation in complex, mixed effects models. However, it has not been investigated in which scenarios either form is preferable. Our simulations and data example show that neither form is preferable in general and give guidance on when to approximate the cumulative density functions and when to approximate the Gaussian weighted integrals and, in the case of the latter, which general purpose method to use among a large list of methods.
Consider the task of matrix estimation in which a dataset $X \in \mathbb{R}^{n\times m}$ is observed with sparsity $p$, and we would like to estimate $\mathbb{E}[X]$, where $\mathbb{E}[X_{ui}] = f(\alpha_u, \beta_i)$ for some Holder smooth function $f$. We consider the setting where the row covariates $\alpha$ are unobserved yet the column covariates $\beta$ are observed. We provide an algorithm and accompanying analysis which shows that our algorithm improves upon naively estimating each row separately when the number of rows is not too small. Furthermore when the matrix is moderately proportioned, our algorithm achieves the minimax optimal nonparametric rate of an oracle algorithm that knows the row covariates. In simulated experiments we show our algorithm outperforms other baselines in low data regimes.
Consider two independent exponential populations having different unknown location parameters and a common unknown scale parameter. Call the population associated with the larger location parameter as the "best" population and the population associated with the smaller location parameter as the "worst" population. For the goal of selecting the best (worst) population a natural selection rule, that has many optimum properties, is the one which selects the population corresponding to the larger (smaller) minimal sufficient statistic. In this article, we consider the problem of estimating the location parameter of the population selected using this natural selection rule. For estimating the location parameter of the selected best population, we derive the uniformly minimum variance unbiased estimator (UMVUE) and show that the analogue of the best affine equivariant estimators (BAEEs) of location parameters is a generalized Bayes estimator. We provide some admissibility and minimaxity results for estimators in the class of linear, affine and permutation equivariant estimators, under the criterion of scaled mean squared error. We also derive a sufficient condition for inadmissibility of an arbitrary affine and permutation equivariant estimator. We provide similar results for the problem of estimating the location parameter of the selected population when the selection goal is that of selecting the worst exponential population. Finally, we provide a simulation study to compare, numerically, the performances of some of the proposed estimators.
Causal inference is to estimate the causal effect in a causal relationship when intervention is applied. Precisely, in a causal model with binary interventions, i.e., control and treatment, the causal effect is simply the difference between the factual and counterfactual. The difficulty is that the counterfactual may never been obtained which has to be estimated and so the causal effect could only be an estimate. The key challenge for estimating the counterfactual is to identify confounders which effect both outcomes and treatments. A typical approach is to formulate causal inference as a supervised learning problem and so counterfactual could be predicted. Including linear regression and deep learning models, recent machine learning methods have been adapted to causal inference. In this paper, we propose a method to estimate Causal Effect by using Variational Information Bottleneck (CEVIB). The promising point is that VIB is able to naturally distill confounding variables from the data, which enables estimating causal effect by using observational data. We have compared CEVIB to other methods by applying them to three data sets showing that our approach achieved the best performance. We also experimentally showed the robustness of our method.
Given an (optimal) dynamic treatment rule, it may be of interest to evaluate that rule -- that is, to ask the causal question: what is the expected outcome had every subject received treatment according to that rule? In this paper, we study the performance of estimators that approximate the true value of: 1) an $a$ $priori$ known dynamic treatment rule 2) the true, unknown optimal dynamic treatment rule (ODTR); 3) an estimated ODTR, a so-called "data-adaptive parameter," whose true value depends on the sample. Using simulations of point-treatment data, we specifically investigate: 1) the impact of increasingly data-adaptive estimation of nuisance parameters and/or of the ODTR on performance; 2) the potential for improved efficiency and bias reduction through the use of semiparametric efficient estimators; and, 3) the importance of sample splitting based on CV-TMLE for accurate inference. In the simulations considered, there was very little cost and many benefits to using the cross-validated targeted maximum likelihood estimator (CV-TMLE) to estimate the value of the true and estimated ODTR; importantly, and in contrast to non cross-validated estimators, the performance of CV-TMLE was maintained even when highly data-adaptive algorithms were used to estimate both nuisance parameters and the ODTR. In addition, we apply these estimators for the value of the rule to the "Interventions" Study, an ongoing randomized controlled trial, to identify whether assigning cognitive behavioral therapy (CBT) to criminal justice-involved adults with mental illness using an ODTR significantly reduces the probability of recidivism, compared to assigning CBT in a non-individualized way.
In the first part of this work, we develop a novel scheme for solving nonparametric regression problems. That is the approximation of possibly low regular and noised functions from the knowledge of their approximate values given at some random points. Our proposed scheme is based on the use of the pseudo-inverse of a random projection matrix, combined with some specific properties of the Jacobi polynomials system, as well as some properties of positive definite random matrices. This scheme has the advantages to be stable, robust, accurate and fairly fast in terms of execution time. In particular, we provide an $L_2$ as well as an $L_2-$risk errors of our proposed nonparametric regression estimator. Moreover and unlike most of the existing nonparametric regression estimators, no extra regularization step is required by our proposed estimator. Although, this estimator is initially designed to work with random sampling set of uni-variate i.i.d. random variables following a Beta distribution, we show that it is still works for a wide range of sampling distribution laws. Moreover, we briefly describe how our estimator can be adapted in order to handle the multivariate case of random sampling sets. In the second part of this work, we extend the random pseudo-inverse scheme technique to build a stable and accurate estimator for solving linear functional regression (LFR) problems. A dyadic decomposition approach is used to construct this last stable estimator for the LFR problem. Alaso, we give an $L_2-$risk error of our proposed LFR estimator. Finally, the performance of the two proposed estimators are illustrated by various numerical simulations. In particular, a real dataset is used to illustrate the performance of our nonparametric regression estimator.
We study the optimal variance reduction solutions for online controlled experiments by applying flexible machine learning tools to incorporate covariates that are independent from the treatment but have predictive power for the outcomes. Employing cross-fitting, we propose variance reduction procedures for both count metrics and ratio metrics in online experiments based on which the inference of the estimands are valid under mild convergence conditions. We also establish the asymptotic optimality of all these procedures under consistency condition of the machine learning estimators. In complement to the proposed nonlinear optimal procedure, a linear adjustment method for ratio metrics is also derived as a special case that is computationally efficient and can flexibly incorporate any pre-treatment covariates. Comprehensive simulation studies are performed and practical suggestions are given. When tested on real online experiment data from LinkedIn, the proposed optimal procedure for ratio metrics can reduce up to $80\%$ of variance compared to the standard difference-in-mean estimator and also further reduce up to $30\%$ of variance compared to the CUPED approach by going beyond linearity and incorporating a large number of extra covariates.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.
In this paper we introduce a covariance framework for the analysis of EEG and MEG data that takes into account observed temporal stationarity on small time scales and trial-to-trial variations. We formulate a model for the covariance matrix, which is a Kronecker product of three components that correspond to space, time and epochs/trials, and consider maximum likelihood estimation of the unknown parameter values. An iterative algorithm that finds approximations of the maximum likelihood estimates is proposed. We perform a simulation study to assess the performance of the estimator and investigate the influence of different assumptions about the covariance factors on the estimated covariance matrix and on its components. Apart from that, we illustrate our method on real EEG and MEG data sets. The proposed covariance model is applicable in a variety of cases where spontaneous EEG or MEG acts as source of noise and realistic noise covariance estimates are needed for accurate dipole localization, such as in evoked activity studies, or where the properties of spontaneous EEG or MEG are themselves the topic of interest, such as in combined EEG/fMRI experiments in which the correlation between EEG and fMRI signals is investigated.