When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. Considering the uncertainty is important, as the variability of the estimate can be high in some cases. An important result by Bengio and Grandvalet (2004) states that no general unbiased variance estimator can be constructed, that would apply for any utility or loss measure and any model. We show that it is possible to construct an unbiased estimator considering a specific predictive performance measure and model. We demonstrate an unbiased sampling distribution variance estimator for the Bayesian normal model with fixed model variance using the expected log pointwise predictive density (elpd) utility score. This example demonstrates that it is possible to obtain improved, problem-specific, unbiased estimators for assessing the uncertainty in LOO-CV estimation.
The naive importance sampling (IS) estimator generally does not work well in examples involving simultaneous inference on several targets, as the importance weights can take arbitrarily large values, making the estimator highly unstable. In such situations, alternative multiple IS estimators involving samples from multiple proposal distributions are preferred. Just like the naive IS, the success of these multiple IS estimators crucially depends on the choice of the proposal distributions. The selection of these proposal distributions is the focus of this article. We propose three methods: (i) a geometric space filling approach, (ii) a minimax variance approach, and (iii) a maximum entropy approach. The first two methods are applicable to any IS estimator, whereas the third approach is described in the context of Doss's (2010) two-stage IS estimator. For the first method, we propose a suitable measure of 'closeness' based on the symmetric Kullback-Leibler divergence, while the second and third approaches use estimates of asymptotic variances of Doss's (2010) IS estimator and Geyer's (1994) reverse logistic regression estimator, respectively. Thus, when samples from the proposal distributions are obtained by running Markov chains, we provide consistent spectral variance estimators for these asymptotic variances. The proposed methods for selecting proposal densities are illustrated using various detailed examples.
This paper presents an approach to trajectory-centric learning control based on contraction metrics and disturbance estimation for nonlinear systems subject to matched uncertainties. The proposed approach allows for the use of deep neural networks to learn uncertain dynamics while still providing guarantees of transient tracking performance throughout the learning phase. Within the proposed approach, a disturbance estimation law is adopted to estimate the pointwise value of the uncertainty, with pre-computable estimation error bounds (EEBs). The learned dynamics, the estimated disturbances, and the EEBs are then incorporated in a robust Riemannian energy condition to compute the control law that guarantees exponential convergence of actual trajectories to desired ones throughout the learning phase, even when the learned model is poor. On the other hand, with improved accuracy, the learned model can be incorporated into a high-level planner to plan better trajectories with improved performance, e.g., lower energy consumption and shorter travel time. The proposed framework is validated on a planar quadrotor navigation example.
Existing inferential methods for small area data involve a trade-off between maintaining area-level frequentist coverage rates and improving inferential precision via the incorporation of indirect information. In this article, we propose a method to obtain an area-level prediction region for a future observation which mitigates this trade-off. The proposed method takes a conformal prediction approach in which the conformity measure is the posterior predictive density of a working model that incorporates indirect information. The resulting prediction region has guaranteed frequentist coverage regardless of the working model, and, if the working model assumptions are accurate, the region has minimum expected volume compared to other regions with the same coverage rate. When constructed under a normal working model, we prove such a prediction region is an interval and construct an efficient algorithm to obtain the exact interval. We illustrate the performance of our method through simulation studies and an application to EPA radon survey data.
The instrumental variable method is widely used in the health and social sciences for identification and estimation of causal effects in the presence of potentially unmeasured confounding. In order to improve efficiency, multiple instruments are routinely used, leading to concerns about bias due to possible violation of the instrumental variable assumptions. To address this concern, we introduce a new class of g-estimators that are guaranteed to remain consistent and asymptotically normal for the causal effect of interest provided that a set of at least $\gamma$ out of $K$ candidate instruments are valid, for $\gamma\leq K$ set by the analyst ex ante, without necessarily knowing the identities of the valid and invalid instruments. We provide formal semiparametric efficiency theory supporting our results. Both simulation studies and applications to the UK Biobank data demonstrate the superior empirical performance of our estimators compared to competing methods.
Let $X^{(n)}$ be an observation sampled from a distribution $P_{\theta}^{(n)}$ with an unknown parameter $\theta,$ $\theta$ being a vector in a Banach space $E$ (most often, a high-dimensional space of dimension $d$). We study the problem of estimation of $f(\theta)$ for a functional $f:E\mapsto {\mathbb R}$ of some smoothness $s>0$ based on an observation $X^{(n)}\sim P_{\theta}^{(n)}.$ Assuming that there exists an estimator $\hat \theta_n=\hat \theta_n(X^{(n)})$ of parameter $\theta$ such that $\sqrt{n}(\hat \theta_n-\theta)$ is sufficiently close in distribution to a mean zero Gaussian random vector in $E,$ we construct a functional $g:E\mapsto {\mathbb R}$ such that $g(\hat \theta_n)$ is an asymptotically normal estimator of $f(\theta)$ with $\sqrt{n}$ rate provided that $s>\frac{1}{1-\alpha}$ and $d\leq n^{\alpha}$ for some $\alpha\in (0,1).$ We also derive general upper bounds on Orlicz norm error rates for estimator $g(\hat \theta)$ depending on smoothness $s,$ dimension $d,$ sample size $n$ and the accuracy of normal approximation of $\sqrt{n}(\hat \theta_n-\theta).$ In particular, this approach yields asymptotically efficient estimators in some high-dimensional exponential models.
We introduce a novel methodology for particle filtering in dynamical systems where the evolution of the signal of interest is described by a SDE and observations are collected instantaneously at prescribed time instants. The new approach includes the discretisation of the SDE and the design of efficient particle filters for the resulting discrete-time state-space model. The discretisation scheme converges with weak order 1 and it is devised to create a sequential dependence structure along the coordinates of the discrete-time state vector. We introduce a class of space-sequential particle filters that exploits this structure to improve performance when the system dimension is large. This is numerically illustrated by a set of computer simulations for a stochastic Lorenz 96 system with additive noise. The new space-sequential particle filters attain approximately constant estimation errors as the dimension of the Lorenz 96 system is increased, with a computational cost that increases polynomially, rather than exponentially, with the system dimension. Besides the new numerical scheme and particle filters, we provide in this paper a general framework for discrete-time filtering in continuous-time dynamical systems described by a SDE and instantaneous observations. Provided that the SDE is discretised using a weakly-convergent scheme, we prove that the marginal posterior laws of the resulting discrete-time state-space model converge to the posterior marginal posterior laws of the original continuous-time state-space model under a suitably defined metric. This result is general and not restricted to the numerical scheme or particle filters specifically studied in this manuscript.
In this paper we study the finite sample and asymptotic properties of various weighting estimators of the local average treatment effect (LATE), several of which are based on Abadie (2003)'s kappa theorem. Our framework presumes a binary endogenous explanatory variable ("treatment") and a binary instrumental variable, which may only be valid after conditioning on additional covariates. We argue that one of the Abadie estimators, which we show is weight normalized, is likely to dominate the others in many contexts. A notable exception is in settings with one-sided noncompliance, where certain unnormalized estimators have the advantage of being based on a denominator that is bounded away from zero. We use a simulation study and three empirical applications to illustrate our findings. In applications to causal effects of college education using the college proximity instrument (Card, 1995) and causal effects of childbearing using the sibling sex composition instrument (Angrist and Evans, 1998), the unnormalized estimates are clearly unreasonable, with "incorrect" signs, magnitudes, or both. Overall, our results suggest that (i) the relative performance of different kappa weighting estimators varies with features of the data-generating process; and that (ii) the normalized version of Tan (2006)'s estimator may be an attractive alternative in many contexts. Applied researchers with access to a binary instrumental variable should also consider covariate balancing or doubly robust estimators of the LATE.
Bayesian model selection provides a powerful framework for objectively comparing models directly from observed data, without reference to ground truth data. However, Bayesian model selection requires the computation of the marginal likelihood (model evidence), which is computationally challenging, prohibiting its use in many high-dimensional Bayesian inverse problems. With Bayesian imaging applications in mind, in this work we present the proximal nested sampling methodology to objectively compare alternative Bayesian imaging models for applications that use images to inform decisions under uncertainty. The methodology is based on nested sampling, a Monte Carlo approach specialised for model comparison, and exploits proximal Markov chain Monte Carlo techniques to scale efficiently to large problems and to tackle models that are log-concave and not necessarily smooth (e.g., involving l_1 or total-variation priors). The proposed approach can be applied computationally to problems of dimension O(10^6) and beyond, making it suitable for high-dimensional inverse imaging problems. It is validated on large Gaussian models, for which the likelihood is available analytically, and subsequently illustrated on a range of imaging problems where it is used to analyse different choices of dictionary and measurement model.
One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.
Models for dependent data are distinguished by their targets of inference. Marginal models are useful when interest lies in quantifying associations averaged across a population of clusters. When the functional form of a covariate-outcome association is unknown, flexible regression methods are needed to allow for potentially non-linear relationships. We propose a novel marginal additive model (MAM) for modelling cluster-correlated data with non-linear population-averaged associations. The proposed MAM is a unified framework for estimation and uncertainty quantification of a marginal mean model, combined with inference for between-cluster variability and cluster-specific prediction. We propose a fitting algorithm that enables efficient computation of standard errors and corrects for estimation of penalty terms. We demonstrate the proposed methods in simulations and in application to (i) a longitudinal study of beaver foraging behaviour, and (ii) a spatial analysis of Loaloa infection in West Africa. R code for implementing the proposed methodology is available at //github.com/awstringer1/mam.