The aim of this paper is to study the asymptotic behavior of a particular multivariate risk measure, the Covariate-Conditional-Tail-Expectation (CCTE), based on a multivariate statistical depth function. Depth functions have become increasingly powerful tools in nonparametric inference for multivariate data, as they measure a degree of centrality of a point with respect to a distribution. A multivariate risks scenario is then represented by a depth-based lower level set of the risk factors, meaning that we consider a non-compact setting. More precisely, given a multivariate depth function D associated to a fixed probability measure, we are interested in the lower level set based on D. First, we present a plug-in approach in order to estimate the depth-based level set. In a second part, we provide a consistent estimator of our CCTE for a general depth function with a rate of convergence, and we consider the particular case of the Mahalanobis depth. A simulation study complements the performances of our estimator.
We consider neural network approximation spaces that classify functions according to the rate at which they can be approximated (with error measured in $L^p$) by ReLU neural networks with an increasing number of coefficients, subject to bounds on the magnitude of the coefficients and the number of hidden layers. We prove embedding theorems between these spaces for different values of $p$. Furthermore, we derive sharp embeddings of these approximation spaces into H\"older spaces. We find that, analogous to the case of classical function spaces (such as Sobolev spaces, or Besov spaces) it is possible to trade "smoothness" (i.e., approximation rate) for increased integrability. Combined with our earlier results in [arXiv:2104.02746], our embedding theorems imply a somewhat surprising fact related to "learning" functions from a given neural network space based on point samples: if accuracy is measured with respect to the uniform norm, then an optimal "learning" algorithm for reconstructing functions that are well approximable by ReLU neural networks is simply given by piecewise constant interpolation on a tensor product grid.
We consider the problem of approximating the arboricity of a graph $G= (V,E)$, which we denote by $\mathsf{arb}(G)$, in sublinear time, where the arboricity of a graph is the minimal number of forests required to cover its edges. An algorithm for this problem may perform degree and neighbor queries, and is allowed a small error probability. We design an algorithm that outputs an estimate $\hat{\alpha}$, such that with probability $1-1/\textrm{poly}(n)$, $\mathsf{arb}(G)/c\log^2 n \leq \hat{\alpha} \leq \mathsf{arb}(G)$, where $n=|V|$ and $c$ is a constant. The expected query complexity and running time of the algorithm are $O(n/\mathsf{arb}(G))\cdot \textrm{poly}(\log n)$, and this upper bound also holds with high probability. %($\widetilde{O}(\cdot)$ is used to suppress $\textrm{poly}(\log n)$ dependencies). This bound is optimal for such an approximation up to a $\textrm{poly}(\log n)$ factor.
This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization (and improvement) of the celebrated Chernoff method. At its heart, it is based on deriving a new class of composite nonnegative martingales, with strong connections to testing by betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform existing approaches based on Hoeffding or empirical Bernstein inequalities and their recent supermartingale generalizations. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, when sampling with and without replacement.
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element $x\in\mathbb{R}^d$ with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets -- the depth-trimmed regions -- give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution. Its good behavior w.r.t. major transformation groups, as well as its ability to factor out translations, are depicted. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in numerical experiments.
We study approximation methods for a large class of mixed models with a probit link function that includes mixed versions of the binomial model, the multinomial model, and generalized survival models. The class of models is special because the marginal likelihood can be expressed as Gaussian weighted integrals or as multivariate Gaussian cumulative density functions. The latter approach is unique to the probit link function models and has been proposed for parameter estimation in complex, mixed effects models. However, it has not been investigated in which scenarios either form is preferable. Our simulations and data example show that neither form is preferable in general and give guidance on when to approximate the cumulative density functions and when to approximate the Gaussian weighted integrals and, in the case of the latter, which general purpose method to use among a large list of methods.
In this paper we study properties of the Laplace approximation of the posterior distribution arising in nonlinear Bayesian inverse problems. Our work is motivated by Schillings et al. (2020), where it is shown that in such a setting the Laplace approximation error in Hellinger distance converges to zero in the order of the noise level. Here, we prove novel error estimates for a given noise level that also quantify the effect due to the nonlinearity of the forward mapping and the dimension of the problem. In particular, we are interested in settings in which a linear forward mapping is perturbed by a small nonlinear mapping. Our results indicate that in this case, the Laplace approximation error is of the size of the perturbation. The paper provides insight into Bayesian inference in nonlinear inverse problems, where linearization of the forward mapping has suitable approximation properties.
We focus on the problem of manifold estimation: given a set of observations sampled close to some unknown submanifold $M$, one wants to recover information about the geometry of $M$. Minimax estimators which have been proposed so far all depend crucially on the a priori knowledge of some parameters quantifying the underlying distribution generating the sample (such as bounds on its density), whereas those quantities will be unknown in practice. Our contribution to the matter is twofold: first, we introduce a one-parameter family of manifold estimators $(\hat{M}_t)_{t\geq 0}$ based on a localized version of convex hulls, and show that for some choice of $t$, the corresponding estimator is minimax on the class of models of $C^2$ manifolds introduced in [Genovese et al., Manifold estimation and singular deconvolution under Hausdorff loss]. Second, we propose a completely data-driven selection procedure for the parameter $t$, leading to a minimax adaptive manifold estimator on this class of models. This selection procedure actually allows us to recover the Hausdorff distance between the set of observations and $M$, and can therefore be used as a scale parameter in other settings, such as tangent space estimation.
The goal of many scientific experiments including A/B testing is to estimate the average treatment effect (ATE), which is defined as the difference between the expected outcomes of two or more treatments. In this paper, we consider a situation where an experimenter can assign a treatment to research subjects sequentially. In adaptive experimental design, the experimenter is allowed to change the probability of assigning a treatment using past observations for estimating the ATE efficiently. However, with this approach, it is difficult to apply a standard statistical method to construct an estimator because the observations are not independent and identically distributed. We thus propose an algorithm for efficient experiments with estimators constructed from dependent samples. We also introduce a sequential testing framework using the proposed estimator. To justify our proposed approach, we provide finite and infinite sample analyses. Finally, we experimentally show that the proposed algorithm exhibits preferable performance.
We consider the Bayesian analysis of models in which the unknown distribution of the outcomes is specified up to a set of conditional moment restrictions. The nonparametric exponentially tilted empirical likelihood function is constructed to satisfy a sequence of unconditional moments based on an increasing (in sample size) vector of approximating functions (such as tensor splines based on the splines of each conditioning variable). For any given sample size, results are robust to the number of expanded moments. We derive Bernstein-von Mises theorems for the behavior of the posterior distribution under both correct and incorrect specification of the conditional moments, subject to growth rate conditions (slower under misspecification) on the number of approximating functions. A large-sample theory for comparing different conditional moment models is also developed. The central result is that the marginal likelihood criterion selects the model that is less misspecified. We also introduce sparsity-based model search for high-dimensional conditioning variables, and provide efficient MCMC computations for high-dimensional parameters. Along with clarifying examples, the framework is illustrated with real-data applications to risk-factor determination in finance, and causal inference under conditional ignorability.
In this paper several related estimation problems are addressed from a Bayesian point of view and optimal estimators are obtained for each of them when some natural loss functions are considered. Namely, we are interested in estimating a regression curve. Simultaneously, the estimation problems of a conditional distribution function, or a conditional density, or even the conditional distribution itself, are considered. All these problems are posed in a sufficiently general framework to cover continuous and discrete, univariate and multivariate, parametric and non-parametric cases, without the need to use a specific prior distribution. The loss functions considered come naturally from the quadratic error loss function comonly used in estimating a real function of the unknown parameter. The cornerstone of the mentioned Bayes estimators is the posterior predictive distribution. Some examples are provided to illustrate these results.