When one observes a sequence of variables $(x_1, y_1), ..., (x_n, y_n)$, conformal prediction is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. While appealing, the computation of such set turns out to be infeasible in general, e.g. when the unknown variable $y_{n+1}$ is continuous. In this paper, we combine conformal prediction techniques with algorithmic stability bounds to derive a prediction set computable with a single model fit. We perform some numerical experiments that illustrate the tightness of our estimation when the sample size is sufficiently large.
Renewal equations are a popular approach used in modelling the number of new infections, i.e., incidence, in an outbreak. We develop a stochastic model of an outbreak based on a time-varying variant of the Crump-Mode-Jagers branching process. This model accommodates a time-varying reproduction number and a time-varying distribution for the generation interval. We then derive renewal-like integral equations for incidence, cumulative incidence and prevalence under this model. We show that the equations for incidence and prevalence are consistent with the so-called back-calculation relationship. We analyse two particular cases of these integral equations, one that arises from a Bellman-Harris process and one that arises from an inhomogeneous Poisson process model of transmission. We outline an argument to show that the incidence integral equations that arise from both of these specific models agree with the renewal equation used ubiquitously in infectious disease modelling. We present a numerical discretisation scheme to solve these equations, and use this scheme to estimate rates of transmission from serological prevalence of SARS-CoV-2 in the UK and historical incidence data on Influenza, Measles, SARS and Smallpox.
Quantifying the data uncertainty in learning tasks is often done by learning a prediction interval or prediction set of the label given the input. Two commonly desired properties for learned prediction sets are \emph{valid coverage} and \emph{good efficiency} (such as low length or low cardinality). Conformal prediction is a powerful technique for learning prediction sets with valid coverage, yet by default its conformalization step only learns a single parameter, and does not optimize the efficiency over more expressive function classes. In this paper, we propose a generalization of conformal prediction to multiple learnable parameters, by considering the constrained empirical risk minimization (ERM) problem of finding the most efficient prediction set subject to valid empirical coverage. This meta-algorithm generalizes existing conformal prediction algorithms, and we show that it achieves approximate valid population coverage and near-optimal efficiency within class, whenever the function class in the conformalization step is low-capacity in a certain sense. Next, this ERM problem is challenging to optimize as it involves a non-differentiable coverage constraint. We develop a gradient-based algorithm for it by approximating the original constrained ERM using differentiable surrogate losses and Lagrangians. Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly over existing approaches in several applications such as prediction intervals with improved length, minimum-volume prediction sets for multi-output regression, and label prediction sets for image classification.
This work studies an experimental design problem where {the values of a predictor variable, denoted by $x$}, are to be determined with the goal of estimating a function $m(x)$, which is observed with noise. A linear model is fitted to $m(x)$ but it is not assumed that the model is correctly specified. It follows that the quantity of interest is the best linear approximation of $m(x)$, which is denoted by $\ell(x)$. It is shown that in this framework the ordinary least squares estimator typically leads to an inconsistent estimation of $\ell(x)$, and rather weighted least squares should be considered. An asymptotic minimax criterion is formulated for this estimator, and a design that minimizes the criterion is constructed. An important feature of this problem is that the $x$'s should be random, rather than fixed. Otherwise, the minimax risk is infinite. It is shown that the optimal random minimax design is different from its deterministic counterpart, which was studied previously, and a simulation study indicates that it generally performs better when $m(x)$ is a quadratic or a cubic function. Another finding is that when the variance of the noise goes to infinity, the random and deterministic minimax designs coincide. The results are illustrated for polynomial regression models and the general case is also discussed.
In this paper, we study the sample complexity of {\em noisy Bayesian quadrature} (BQ), in which we seek to approximate an integral based on noisy black-box queries to the underlying function. We consider functions in a {\em Reproducing Kernel Hilbert Space} (RKHS) with the Mat\'ern-$\nu$ kernel, focusing on combinations of the parameter $\nu$ and dimension $d$ such that the RKHS is equivalent to a Sobolev class. In this setting, we provide near-matching upper and lower bounds on the best possible average error. Specifically, we find that when the black-box queries are subject to Gaussian noise having variance $\sigma^2$, any algorithm making at most $T$ queries (even with adaptive sampling) must incur a mean absolute error of $\Omega(T^{-\frac{\nu}{d}-1} + \sigma T^{-\frac{1}{2}})$, and there exists a non-adaptive algorithm attaining an error of at most $O(T^{-\frac{\nu}{d}-1} + \sigma T^{-\frac{1}{2}})$. Hence, the bounds are order-optimal, and establish that there is no adaptivity gap in terms of scaling laws.
Bayesian nonparametric methods are a popular choice for analysing survival data due to their ability to flexibly model the distribution of survival times. These methods typically employ a nonparametric prior on the survival function that is conjugate with respect to right-censored data. Eliciting these priors, particularly in the presence of covariates, can be challenging and inference typically relies on computationally intensive Markov chain Monte Carlo schemes. In this paper, we build on recent work that recasts Bayesian inference as assigning a predictive distribution on the unseen values of a population conditional on the observed samples, thus avoiding the need to specify a complex prior. We describe a copula-based predictive update which admits a scalable sequential importance sampling algorithm to perform inference that properly accounts for right-censoring. We provide theoretical justification through an extension of Doob's consistency theorem and illustrate the method on a number of simulated and real data sets, including an example with covariates. Our approach enables analysts to perform Bayesian nonparametric inference through only the specification of a predictive distribution.
Bayesian likelihood-free inference, which is used to perform Bayesian inference when the likelihood is intractable, enjoys an increasing number of important scientific applications. However, many aspects of a Bayesian analysis become more challenging in the likelihood-free setting. One example of this is prior-data conflict checking, where the goal is to assess whether the information in the data and the prior are inconsistent. Conflicts of this kind are important to detect, since they may reveal problems in an investigator's understanding of what are relevant values of the parameters, and can result in sensitivity of Bayesian inferences to the prior. Here we consider methods for prior-data conflict checking which are applicable regardless of whether the likelihood is tractable or not. In constructing our checks, we consider checking statistics based on prior-to-posterior Kullback-Leibler divergences. The checks are implemented using mixture approximations to the posterior distribution and closed-form approximations to Kullback-Leibler divergences for mixtures, which make Monte Carlo approximation of reference distributions for calibration computationally feasible. When prior-data conflicts occur, it is useful to consider weakly informative prior specifications in alternative analyses as part of a sensitivity analysis. As a main application of our methodology, we develop a technique for searching for weakly informative priors in likelihood-free inference, where the notion of a weakly informative prior is formalized using prior-data conflict checks. The methods are demonstrated in three examples.
Deep learning has the potential to augment several clinically useful aspects of the radiologist's workflow such as medical imaging interpretation. However, the translation of deep learning algorithms into clinical practice has been hindered by relative lack of transparency in these algorithms compared to more traditional statistical methods. Specifically, common deep learning models lack intuitive and rigorous methods of conveying prediction confidence in a calibrated manner, which ultimately restricts widespread use of these "black box" systems for critical decision-making. Furthermore, numerous demonstrations of algorithmic bias in clinical machine learning have caused considerable hesitancy towards the deployment of these models for clinical application. To this end, we explore how conformal predictions can complement existing deep learning approaches by providing an intuitive way of expressing model uncertainty to facilitate greater transparency to clinical users. In this paper, we conduct field interviews with radiologists to assess potential use-cases of conformal predictors. Using insights collected from these interviews, we devise two use-cases and empirically evaluate several conformal methods on a dermatology photography dataset for skin lesion classification. Additionally, we show how group conformal predictors are more adaptive to differences between patient skin tones for malignant skin lesions. We find our conformal predictors to be a promising and generally applicable approach to increasing clinical usability and trustworthiness -- hopefully facilitating better modes of collaboration between medical AI tools and their clinical users.
Ensemble methods based on subsampling, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and more accurate confidence interval coverage without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.
Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators taking the form of dyadic empirical processes. Our main contributions include the minimax-optimal uniform convergence rate of the dyadic kernel density estimator, along with strong approximation results for the associated standardized and Studentized $t$-processes. A consistent variance estimator enables the construction of valid and feasible uniform confidence bands for the unknown density function. A crucial feature of dyadic distributions is that they may be "degenerate" at certain points in the support of the data, a property making our analysis somewhat delicate. Nonetheless our methods for uniform inference remain robust to the potential presence of such points. For implementation purposes, we discuss procedures based on positive semi-definite covariance estimators, mean squared error optimal bandwidth selectors and robust bias-correction techniques. We illustrate the empirical finite-sample performance of our methods both in simulations and with real-world data. Our technical results concerning strong approximations and maximal inequalities are of potential independent interest.
We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the problem of intrusion prevention as an (optimal) multiple stopping problem. This formulation gives us insight into the structure of optimal policies, which we show to have threshold properties. For most practical cases, it is not feasible to obtain an optimal defender policy using dynamic programming. We therefore develop a reinforcement learning approach to approximate an optimal policy. Our method for learning and validating policies includes two systems: a simulation system where defender policies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned policies are evaluated. We show that our approach can produce effective defender policies for a practical IT infrastructure of limited size. Inspection of the learned policies confirms that they exhibit threshold properties.