Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
This paper explores the generalization characteristics of iterative learning algorithms with bounded updates for non-convex loss functions, employing information-theoretic techniques. Our key contribution is a novel bound for the generalization error of these algorithms with bounded updates, extending beyond the scope of previous works that only focused on Stochastic Gradient Descent (SGD). Our approach introduces two main novelties: 1) we reformulate the mutual information as the uncertainty of updates, providing a new perspective, and 2) instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process. We analyze our generalization bound under various settings and demonstrate improved bounds when the model dimension increases at the same rate as the number of training data samples. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Ultimately, our work takes a further step for developing practical generalization theories.
High-order structures have been recognised as suitable models for systems going beyond the binary relationships for which graph models are appropriate. Despite their importance and surge in research on these structures, their random cases have been only recently become subjects of interest. One of these high-order structures is the oriented hypergraph, which relates couples of subsets of an arbitrary number of vertices. Here we develop the Erd\H{o}s-R\'enyi model for oriented hypergraphs, which corresponds to the random realisation of oriented hyperedges of the complete oriented hypergraph. A particular feature of random oriented hypergraphs is that the ratio between their expected number of oriented hyperedges and their expected degree or size is 3/2 for large number of vertices. We highlight the suitability of oriented hypergraphs for modelling large collections of chemical reactions and the importance of random oriented hypergraphs to analyse the unfolding of chemistry.
The efficacy of numerical methods like integral estimates via Gaussian quadrature formulas depends on the localization of the zeros of the associated family of orthogonal polynomials. In this regard, following the renewed interest in quadrature formulas on the unit circle, and $R_{II}$-type polynomials, which include the complementary Romanovski-Routh polynomials, in this work we present a collection of properties of their zeros. Our results include extreme bounds, convexity, and density, alongside the connection of such polynomials to classical orthogonal polynomials via asymptotic formulas.
We analyse a numerical scheme for a system arising from a novel description of the standard elastic--perfectly plastic response. The elastic--perfectly plastic response is described via rate-type equations that do not make use of the standard elastic-plastic decomposition, and the model does not require the use of variational inequalities. Furthermore, the model naturally includes the evolution equation for temperature. We present a low order discretisation based on the finite element method. Under certain restrictions on the mesh we subsequently prove the existence of discrete solutions, and we discuss the stability properties of the numerical scheme. The analysis is supplemented with computational examples.
Developing an efficient computational scheme for high-dimensional Bayesian variable selection in generalised linear models and survival models has always been a challenging problem due to the absence of closed-form solutions for the marginal likelihood. The RJMCMC approach can be employed to samples model and coefficients jointly, but effective design of the transdimensional jumps of RJMCMC can be challenge, making it hard to implement. Alternatively, the marginal likelihood can be derived using data-augmentation scheme e.g. Polya-gamma data argumentation for logistic regression) or through other estimation methods. However, suitable data-augmentation schemes are not available for every generalised linear and survival models, and using estimations such as Laplace approximation or correlated pseudo-marginal to derive marginal likelihood within a locally informed proposal can be computationally expensive in the "large n, large p" settings. In this paper, three main contributions are presented. Firstly, we present an extended Point-wise implementation of Adaptive Random Neighbourhood Informed proposal (PARNI) to efficiently sample models directly from the marginal posterior distribution in both generalised linear models and survival models. Secondly, in the light of the approximate Laplace approximation, we also describe an efficient and accurate estimation method for the marginal likelihood which involves adaptive parameters. Additionally, we describe a new method to adapt the algorithmic tuning parameters of the PARNI proposal by replacing the Rao-Blackwellised estimates with the combination of a warm-start estimate and an ergodic average. We present numerous numerical results from simulated data and 8 high-dimensional gene fine mapping data-sets to showcase the efficiency of the novel PARNI proposal compared to the baseline add-delete-swap proposal.
We give a short survey of recent results on sparse-grid linear algorithms of approximate recovery and integration of functions possessing a unweighted or weighted Sobolev mixed smoothness based on their sampled values at a certain finite set. Some of them are extended to more general cases.
In this paper, we present a discontinuity and cusp capturing physics-informed neural network (PINN) to solve Stokes equations with a piecewise-constant viscosity and singular force along an interface. We first reformulate the governing equations in each fluid domain separately and replace the singular force effect with the traction balance equation between solutions in two sides along the interface. Since the pressure is discontinuous and the velocity has discontinuous derivatives across the interface, we hereby use a network consisting of two fully-connected sub-networks that approximate the pressure and velocity, respectively. The two sub-networks share the same primary coordinate input arguments but with different augmented feature inputs. These two augmented inputs provide the interface information, so we assume that a level set function is given and its zero level set indicates the position of the interface. The pressure sub-network uses an indicator function as an augmented input to capture the function discontinuity, while the velocity sub-network uses a cusp-enforced level set function to capture the derivative discontinuities via the traction balance equation. We perform a series of numerical experiments to solve two- and three-dimensional Stokes interface problems and perform an accuracy comparison with the augmented immersed interface methods in literature. Our results indicate that even a shallow network with a moderate number of neurons and sufficient training data points can achieve prediction accuracy comparable to that of immersed interface methods.
The problem of computing topological distance between two scalar fields based on Reeb graphs or contour trees has been studied and applied successfully to various problems in topological shape matching, data analysis, and visualization. However, generalizing such results for computing distance measures between two multi-fields based on their Reeb spaces is still in its infancy. Towards this, in the current paper we propose a technique to compute an effective distance measure between two multi-fields by computing a novel \emph{multi-dimensional persistence diagram} (MDPD) corresponding to each of the (quantized) Reeb spaces. First, we construct a multi-dimensional Reeb graph (MDRG), which is a hierarchical decomposition of the Reeb space into a collection of Reeb graphs. The MDPD corresponding to each MDRG is then computed based on the persistence diagrams of the component Reeb graphs of the MDRG. Our distance measure extends the Wasserstein distance between two persistence diagrams of Reeb graphs to MDPDs of MDRGs. We prove that the proposed measure is a pseudo-metric and satisfies a stability property. Effectiveness of the proposed distance measure has been demonstrated in (i) shape retrieval contest data - SHREC $2010$ and (ii) Pt-CO bond detection data from computational chemistry. Experimental results show that the proposed distance measure based on the Reeb spaces has more discriminating power in clustering the shapes and detecting the formation of a stable Pt-CO bond as compared to the similar measures between Reeb graphs.
Besov priors are nonparametric priors that can model spatially inhomogeneous functions. They are routinely used in inverse problems and imaging, where they exhibit attractive sparsity-promoting and edge-preserving features. A recent line of work has initiated the study of their asymptotic frequentist convergence properties. In the present paper, we consider the theoretical recovery performance of the posterior distributions associated to Besov-Laplace priors in the density estimation model, under the assumption that the observations are generated by a possibly spatially inhomogeneous true density belonging to a Besov space. We improve on existing results and show that carefully tuned Besov-Laplace priors attain optimal posterior contraction rates. Furthermore, we show that hierarchical procedures involving a hyper-prior on the regularity parameter lead to adaptation to any smoothness level.
The semi-empirical nature of best-estimate models closing the balance equations of thermal-hydraulic (TH) system codes is well-known as a significant source of uncertainty for accuracy of output predictions. This uncertainty, called model uncertainty, is usually represented by multiplicative (log-)Gaussian variables whose estimation requires solving an inverse problem based on a set of adequately chosen real experiments. One method from the TH field, called CIRCE, addresses it. We present in the paper a generalization of this method to several groups of experiments each having their own properties, including different ranges for input conditions and different geometries. An individual (log-)Gaussian distribution is therefore estimated for each group in order to investigate whether the model uncertainty is homogeneous between the groups, or should depend on the group. To this end, a multi-group CIRCE is proposed where a variance parameter is estimated for each group jointly to a mean parameter common to all the groups to preserve the uniqueness of the best-estimate model. The ECME algorithm for Maximum Likelihood Estimation is adapted to the latter context, then applied to relevant demonstration cases. Finally, it is tested on a practical case to assess the uncertainty of critical mass flow assuming two groups due to the difference of geometry between the experimental setups.