We quantify the parameter stability of a spherical Gaussian Mixture Model (sGMM) under small perturbations in distribution space. Namely, we derive the first explicit bound to show that for a mixture of spherical Gaussian $P$ (sGMM) in a pre-defined model class, all other sGMM close to $P$ in this model class in total variation distance has a small parameter distance to $P$. Further, this upper bound only depends on $P$. The motivation for this work lies in providing guarantees for fitting Gaussian mixtures; with this aim in mind, all the constants involved are well defined and distribution free conditions for fitting mixtures of spherical Gaussians. Our results tighten considerably the existing computable bounds, and asymptotically match the known sharp thresholds for this problem.
Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries. These symmetries can transform a trained model such that it performs similarly on new samples, which allows ensemble building that improves robustness under certain adversarial attacks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.
When factorized approximations are used for variational inference (VI), they tend to underestimate the uncertainty -- as measured in various ways -- of the distributions they are meant to approximate. We consider two popular ways to measure the uncertainty deficit of VI: (i) the degree to which it underestimates the componentwise variance, and (ii) the degree to which it underestimates the entropy. To better understand these effects, and the relationship between them, we examine an informative setting where they can be explicitly (and elegantly) analyzed: the approximation of a Gaussian,~$p$, with a dense covariance matrix, by a Gaussian,~$q$, with a diagonal covariance matrix. We prove that $q$ always underestimates both the componentwise variance and the entropy of $p$, \textit{though not necessarily to the same degree}. Moreover we demonstrate that the entropy of $q$ is determined by the trade-off of two competing forces: it is decreased by the shrinkage of its componentwise variances (our first measure of uncertainty) but it is increased by the factorized approximation which delinks the nodes in the graphical model of $p$. We study various manifestations of this trade-off, notably one where, as the dimension of the problem grows, the per-component entropy gap between $p$ and $q$ becomes vanishingly small even though $q$ underestimates every componentwise variance by a constant multiplicative factor. We also use the shrinkage-delinkage trade-off to bound the entropy gap in terms of the problem dimension and the condition number of the correlation matrix of $p$. Finally we present empirical results on both Gaussian and non-Gaussian targets, the former to validate our analysis and the latter to explore its limitations.
This paper focuses on parameter estimation and introduces a new method for lower bounding the Bayesian risk. The method allows for the use of virtually \emph{any} information measure, including R\'enyi's $\alpha$, $\varphi$-Divergences, and Sibson's $\alpha$-Mutual Information. The approach considers divergences as functionals of measures and exploits the duality between spaces of measures and spaces of functions. In particular, we show that one can lower bound the risk with any information measure by upper bounding its dual via Markov's inequality. We are thus able to provide estimator-independent impossibility results thanks to the Data-Processing Inequalities that divergences satisfy. The results are then applied to settings of interest involving both discrete and continuous parameters, including the ``Hide-and-Seek'' problem, and compared to the state-of-the-art techniques. An important observation is that the behaviour of the lower bound in the number of samples is influenced by the choice of the information measure. We leverage this by introducing a new divergence inspired by the ``Hockey-Stick'' Divergence, which is demonstrated empirically to provide the largest lower-bound across all considered settings. If the observations are subject to privatisation, stronger impossibility results can be obtained via Strong Data-Processing Inequalities. The paper also discusses some generalisations and alternative directions.
Motivated by applications in polymer-based data storage we introduced the new problem of characterizing the code rate and designing constant-weight binary $B_2$-sequences. Binary $B_2$-sequences are collections of binary strings of length $n$ with the property that the real-valued sums of all distinct pairs of strings are distinct. In addition to this defining property, constant-weight binary $B_2$-sequences also satisfy the constraint that each string has a fixed, relatively small weight $\omega$ that scales linearly with $n$. The constant-weight constraint ensures low-cost synthesis and uniform processing of the data readout via tandem mass spectrometers. Our main results include upper bounds on the size of the codes formulated as entropy-optimization problems and constructive lower bounds based on Sidon sequences.
Three asymptotic limits exist for the Euler equations at low Mach number - purely convective, purely acoustic, and mixed convective-acoustic. Standard collocated density-based numerical schemes for compressible flow are known to fail at low Mach number due to the incorrect asymptotic scaling of the artificial diffusion. Previous studies of this class of schemes have shown a variety of behaviours across the different limits and proposed guidelines for the design of low-Mach schemes. However, these studies have primarily focused on specific discretisations and/or only the convective limit. In this paper, we review the low-Mach behaviour using the modified equations - the continuous Euler equations augmented with artificial diffusion terms - which are representative of a wide range of schemes in this class. By considering both convective and acoustic effects, we show that three diffusion scalings naturally arise. Single- and multiple-scale asymptotic analysis of these scalings shows that many of the important low-Mach features of this class of schemes can be reproduced in a straightforward manner in the continuous setting. As an example, we show that many existing low-Mach Roe-type finite-volume schemes match one of these three scalings. Our analysis corroborates previous analysis of these schemes, and we are able to refine previous guidelines on the design of low-Mach schemes by including both convective and acoustic effects. Discrete analysis and numerical examples demonstrate the behaviour of minimal Roe-type schemes with each of the three scalings for convective, acoustic, and mixed flows.
Numerical simulations with rigid particles, drops or vesicles constitute some examples that involve 3D objects with spherical topology. When the numerical method is based on boundary integral equations, the error in using a regular quadrature rule to approximate the layer potentials that appear in the formulation will increase rapidly as the evaluation point approaches the surface and the integrand becomes sharply peaked. To determine when the accuracy becomes insufficient, and a more costly special quadrature method should be used, error estimates are needed. In this paper we present quadrature error estimates for layer potentials evaluated near surfaces of genus 0, parametrized using a polar and an azimuthal angle, discretized by a combination of the Gauss-Legendre and the trapezoidal quadrature rules. The error estimates involve no unknown coefficients, but complex valued roots of a specified distance function. The evaluation of the error estimates in general requires a one dimensional local root-finding procedure, but for specific geometries we obtain analytical results. Based on these explicit solutions, we derive simplified error estimates for layer potentials evaluated near spheres; these simple formulas depend only on the distance from the surface, the radius of the sphere and the number of discretization points. The usefulness of these error estimates is illustrated with numerical examples.
This paper focuses on parameter estimation and introduces a new method for lower bounding the Bayesian risk. The method allows for the use of virtually \emph{any} information measure, including R\'enyi's $\alpha$, $\varphi$-Divergences, and Sibson's $\alpha$-Mutual Information. The approach considers divergences as functionals of measures and exploits the duality between spaces of measures and spaces of functions. In particular, we show that one can lower bound the risk with any information measure by upper bounding its dual via Markov's inequality. We are thus able to provide estimator-independent impossibility results thanks to the Data-Processing Inequalities that divergences satisfy. The results are then applied to settings of interest involving both discrete and continuous parameters, including the ``Hide-and-Seek'' problem, and compared to the state-of-the-art techniques. An important observation is that the behaviour of the lower bound in the number of samples is influenced by the choice of the information measure. We leverage this by introducing a new divergence inspired by the ``Hockey-Stick'' Divergence, which is demonstrated empirically to provide the largest lower-bound across all considered settings. If the observations are subject to privatisation, stronger impossibility results can be obtained via Strong Data-Processing Inequalities. The paper also discusses some generalisations and alternative directions.
We review Quasi Maximum Likelihood estimation of factor models for high-dimensional panels of time series. We consider two cases: (1) estimation when no dynamic model for the factors is specified (Bai and Li, 2016); (2) estimation based on the Kalman smoother and the Expectation Maximization algorithm thus allowing to model explicitly the factor dynamics (Doz et al., 2012). Our interest is in approximate factor models, i.e., when we allow for the idiosyncratic components to be mildly cross-sectionally, as well as serially, correlated. Although such setting apparently makes estimation harder, we show, in fact, that factor models do not suffer of the curse of dimensionality problem, but instead they enjoy a blessing of dimensionality property. In particular, we show that if the cross-sectional dimension of the data, $N$, grows to infinity, then: (i) identification of the model is still possible, (ii) the mis-specification error due to the use of an exact factor model log-likelihood vanishes. Moreover, if we let also the sample size, $T$, grow to infinity, we can also consistently estimate all parameters of the model and make inference. The same is true for estimation of the latent factors which can be carried out by weighted least-squares, linear projection, or Kalman filtering/smoothing. We also compare the approaches presented with: Principal Component analysis and the classical, fixed $N$, exact Maximum Likelihood approach. We conclude with a discussion on efficiency of the considered estimators.
In this article, we propose a new class of consistent tests for $p$-variate normality. These tests are based on the characterization of the standard multivariate normal distribution, that the Hessian of the corresponding cumulant generating function is identical to the $p\times p$ identity matrix and the idea of decomposing the information from the joint distribution into the dependence copula and all marginal distributions. Under the null hypothesis of multivariate normality, our proposed test statistic is independent of the unknown mean vector and covariance matrix so that the distribution-free critical value of the test can be obtained by Monte Carlo simulation. We also derive the asymptotic null distribution of proposed test statistic and establish the consistency of the test against different fixed alternatives. Last but not least, a comprehensive and extensive Monte Carlo study also illustrates that our test is a superb yet computationally convenient competitor to many well-known existing test statistics.
Liou-Steffen splitting (AUSM) schemes are popular for low Mach number simulations, however, like many numerical schemes for compressible flow they require careful modification to accurately resolve convective features in this regime. Previous analyses of these schemes usually focus only on a single discrete scheme at the convective limit, only considering flow with acoustic effects empirically, if at all. In our recent paper Hope-Collins & di Mare, 2023 we derived constraints on the artificial diffusion scaling of low Mach number schemes for flows both with and without acoustic effects, and applied this analysis to Roe-type finite-volume schemes. In this paper we form approximate diffusion matrices for the Liou-Steffen splitting, as well as the closely related Zha-Bilgen and Toro-Vasquez splittings. We use the constraints found in Hope-Collins & di Mare, 2023 to derive and analyse the required scaling of each splitting at low Mach number. By transforming the diffusion matrices to the entropy variables we can identify erroneous diffusion terms compared to the ideal form used in Hope-Collins & di Mare, 2023. These terms vanish asymptotically for the Liou-Steffen splitting, but result in spurious entropy generation for the Zha-Bilgen and Toro-Vasquez splittings unless a particular form of the interface pressure is used. Numerical examples for acoustic and convective flow verify the results of the analysis, and show the importance of considering the resolution of the entropy field when assessing schemes of this type.