We show that Pfaffians or contiguity relations of hypergeometric functions of several variables give a direct sampling algorithm from toric models in statistics, which is a Markov chain on a lattice generated by a matrix $A$. A correspondence among graphical models and $A$-hypergeometric system is discussed and we give a sum formula of special values of $A$-hypergeometric polynomials. Some hypergeometric series which are interesting in view of statistics are presented.
We present a numerical iterative optimization algorithm for the minimization of a cost function consisting of a linear combination of three convex terms, one of which is differentiable, a second one is prox-simple and the third one is the composition of a linear map and a prox-simple function. The algorithm's special feature lies in its ability to approximate, in a single iteration run, the minimizers of the cost function for many different values of the parameters determining the relative weight of the three terms in the cost function. A proof of convergence of the algorithm, based on an inexact variable metric approach, is also provided. As a special case, one recovers a generalization of the primal-dual algorithm of Chambolle and Pock, and also of the proximal-gradient algorithm. Finally, we show how it is related to a primal-dual iterative algorithm based on inexact proximal evaluations of the non-smooth terms of the cost function.
Approximating a univariate function on the interval $[-1,1]$ with a polynomial is among the most classical problems in numerical analysis. When the function evaluations come with noise, a least-squares fit is known to reduce the effect of noise as more samples are taken. The generic algorithm for the least-squares problem requires $O(Nn^2)$ operations, where $N+1$ is the number of sample points and $n$ is the degree of the polynomial approximant. This algorithm is unstable when $n$ is large, for example $n\gg \sqrt{N}$ for equispaced sample points. In this study, we blend numerical analysis and statistics to introduce a stable and fast $O(N\log N)$ algorithm called NoisyChebtrunc based on the Chebyshev interpolation. It has the same error reduction effect as least-squares and the convergence is spectral until the error reaches $O(\sigma \sqrt{{n}/{N}})$, where $\sigma$ is the noise level, after which the error continues to decrease at the Monte-Carlo $O(1/\sqrt{N})$ rate. To determine the polynomial degree, NoisyChebtrunc employs a statistical criterion, namely Mallows' $C_p$. We analyze NoisyChebtrunc in terms of the variance and concentration in the infinity norm to the underlying noiseless function. These results show that with high probability the infinity-norm error is bounded by a small constant times $\sigma \sqrt{{n}/{N}}$, when the noise {is} independent and follows a subgaussian or subexponential distribution. We illustrate the performance of NoisyChebtrunc with numerical experiments.
We explore a linear inhomogeneous elasticity equation with random Lam\'e parameters. The latter are parameterized by a countably infinite number of terms in separated expansions. The main aim of this work is to estimate expected values (considered as an infinite dimensional integral on the parametric space corresponding to the random coefficients) of linear functionals acting on the solution of the elasticity equation. To achieve this, the expansions of the random parameters are truncated, a high-order quasi-Monte Carlo (QMC) is combined with a sparse grid approach to approximate the high dimensional integral, and a Galerkin finite element method (FEM) is introduced to approximate the solution of the elasticity equation over the physical domain. The error estimates from (1) truncating the infinite expansion, (2) the Galerkin FEM, and (3) the QMC sparse grid quadrature rule are all studied. For this purpose, we show certain required regularity properties of the continuous solution with respect to both the parametric and physical variables. To achieve our theoretical regularity and convergence results, some reasonable assumptions on the expansions of the random coefficients are imposed. Finally, some numerical results are delivered.
Randomized matrix algorithms have become workhorse tools in scientific computing and machine learning. To use these algorithms safely in applications, they should be coupled with posterior error estimates to assess the quality of the output. To meet this need, this paper proposes two diagnostics: a leave-one-out error estimator for randomized low-rank approximations and a jackknife resampling method to estimate the variance of the output of a randomized matrix computation. Both of these diagnostics are rapid to compute for randomized low-rank approximation algorithms such as the randomized SVD and randomized Nystr\"om approximation, and they provide useful information that can be used to assess the quality of the computed output and guide algorithmic parameter choices.
We study high-dimensional, ridge-regularized logistic regression in a setting in which the covariates may be missing or corrupted by additive noise. When both the covariates and the additive corruptions are independent and normally distributed, we provide exact characterizations of both the prediction error as well as the estimation error. Moreover, we show that these characterizations are universal: as long as the entries of the data matrix satisfy a set of independence and moment conditions, our guarantees continue to hold. Universality, in turn, enables the detailed study of several imputation-based strategies when the covariates are missing completely at random. We ground our study by comparing the performance of these strategies with the conjectured performance -- stemming from replica theory in statistical physics -- of the Bayes optimal procedure. Our analysis yields several insights including: (i) a distinction between single imputation and a simple variant of multiple imputation and (ii) that adding a simple ridge regularization term to single-imputed logistic regression can yield an estimator whose prediction error is nearly indistinguishable from the Bayes optimal prediction error. We supplement our findings with extensive numerical experiments.
The Gibbs sampler (a.k.a. Glauber dynamics and heat-bath algorithm) is a popular Markov Chain Monte Carlo algorithm which iteratively samples from the conditional distributions of a probability measure $\pi$ of interest. Under the assumption that $\pi$ is strongly log-concave, we show that the random scan Gibbs sampler contracts in relative entropy and provide a sharp characterization of the associated contraction rate. Assuming that evaluating conditionals is cheap compared to evaluating the joint density, our results imply that the number of full evaluations of $\pi$ needed for the Gibbs sampler to mix grows linearly with the condition number and is independent of the dimension. If $\pi$ is non-strongly log-concave, the convergence rate in entropy degrades from exponential to polynomial. Our techniques are versatile and extend to Metropolis-within-Gibbs schemes and the Hit-and-Run algorithm. A comparison with gradient-based schemes and the connection with the optimization literature are also discussed.
We study combinatorial properties of plateaued functions. All quadratic functions, bent functions and most known APN functions are plateaued, so many cryptographic primitives rely on plateaued functions as building blocks. The main focus of our study is the interplay of the Walsh transform and linearity of a plateaued function, its differential properties, and their value distributions, i.e., the sizes of image and preimage sets. In particular, we study the special case of ``almost balanced'' plateaued functions, which only have two nonzero preimage set sizes, generalizing for instance all monomial functions. We achieve several direct connections and (non)existence conditions for these functions, showing for instance that plateaued $d$-to-$1$ functions (and thus plateaued monomials) only exist for a very select choice of $d$, and we derive for all these functions their linearity as well as bounds on their differential uniformity. We also specifically study the Walsh transform of plateaued APN functions and their relation to their value distribution.
We discuss the second-order differential uniformity of vectorial Boolean functions. The closely related notion of second-order zero differential uniformity has recently been studied in connection to resistance to the boomerang attack. We prove that monomial functions with univariate form $x^d$ where $d=2^{2k}+2^k+1$ and $\gcd(k,n)=1$ have optimal second-order differential uniformity. Computational results suggest that, up to affine equivalence, these might be the only optimal cubic power functions. We begin work towards generalising such conditions to all monomial functions of algebraic degree 3. We also discuss further questions arising from computational results.
Finding eigenvalue distributions for a number of sparse random matrix ensembles can be reduced to solving nonlinear integral equations of the Hammerstein type. While a systematic mathematical theory of such equations exists, it has not been previously applied to sparse matrix problems. We close this gap in the literature by showing how one can employ numerical solutions of Hammerstein equations to accurately recover the spectra of adjacency matrices and Laplacians of random graphs. While our treatment focuses on random graphs for concreteness, the methodology has broad applications to more general sparse random matrices.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.