We describe a measure quantization procedure i.e., an algorithm which finds the best approximation of a target probability law (and more generally signed finite variation measure) by a sum of $Q$ Dirac masses ($Q$ being the quantization parameter). The procedure is implemented by minimizing the statistical distance between the original measure and its quantized version; the distance is built from a negative definite kernel and, if necessary, can be computed on the fly and feed to a stochastic optimization algorithm (such as SGD, Adam, ...). We investigate theoretically the fundamental questions of existence of the optimal measure quantizer and identify what are the required kernel properties that guarantee suitable behavior. We propose two best linear unbiased (BLUE) estimators for the squared statistical distance and use them in an unbiased procedure, called HEMQ, to find the optimal quantization. We test HEMQ on several databases: multi-dimensional Gaussian mixtures, Wiener space cubature, Italian wine cultivars and the MNIST image database. The results indicate that the HEMQ algorithm is robust and versatile and, for the class of Huber-energy kernels, matches the expected intuitive behavior.
One-bit quantization with time-varying sampling thresholds (also known as random dithering) has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional multi-bit counterparts. This characteristic endows one-bit signal processing frameworks with what one may refer to as sample abundance. We show that sample abundance plays a pivotal role in many signal recovery and optimization problems that are formulated as (possibly non-convex) quadratic programs with linear feasibility constraints. Of particular interest to our work are low-rank matrix recovery and compressed sensing applications that take advantage of one-bit quantization. We demonstrate that the sample abundance paradigm allows for the transformation of such problems to merely linear feasibility problems by forming large-scale overdetermined linear systems -- thus removing the need for handling costly optimization constraints and objectives. To make the proposed computational cost savings achievable, we offer enhanced randomized Kaczmarz algorithms to solve these highly overdetermined feasibility problems and provide theoretical guarantees in terms of their convergence, sample size requirements, and overall performance. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies.
Many neurodegenerative diseases are connected to the spreading of misfolded prionic proteins. In this paper, we analyse the process of misfolding and spreading of both $\alpha$-synuclein and Amyloid-$\beta$, related to Parkinson's and Alzheimer's diseases, respectively. We introduce and analyze a positivity-preserving numerical method for the discretization of the Fisher-Kolmogorov equation, modelling accumulation and spreading of prionic proteins. The proposed approximation method is based on the discontinuous Galerkin method on polygonal and polyhedral grids for space discretization and on $\vartheta-$method time integration scheme. We prove the existence of the discrete solution and a convergence result where the Implicit Euler scheme is employed for time integration. We show that the proposed approach is structure-preserving, in the sense that it guaranteed that the discrete solution is non-negative, a feature that is of paramount importance in practical application. The numerical verification of our numerical model is performed both using a manufactured solution and considering wavefront propagation in two-dimensional polygonal grids. Next, we present a simulation of $\alpha$-synuclein spreading in a two-dimensional brain slice in the sagittal plane. The polygonal mesh for this simulation is agglomerated maintaining the distinction of white and grey matter, taking advantage of the flexibility of PolyDG methods in the mesh construction. Finally, we simulate the spreading of Amyloid-$\beta$ in a patient-specific setting by using a three-dimensional geometry reconstructed from magnetic resonance images and an initial condition reconstructed from positron emission tomography. Our numerical simulations confirm that the proposed method is able to capture the evolution of Parkinson's and Alzheimer's diseases.
In this paper the authors study a non-linear elliptic-parabolic system, which is motivated by mathematical models for lithium-ion batteries. One state satisfies a parabolic reaction diffusion equation and the other one an elliptic equation. The goal is to determine several scalar parameters in the coupled model in an optimal manner by utilizing a reliable reduced-order approach based on the reduced basis (RB) method. However, the states are coupled through a strongly non-linear function, and this makes the evaluation of online-efficient error estimates difficult. First the well-posedness of the system is proved. Then a Galerkin finite element and RB discretization are described for the coupled system. To certify the RB scheme hierarchical a-posteriori error estimators are utilized in an adaptive trust-region optimization method. Numerical experiments illustrate good approximation properties and efficiencies by using only a relatively small number of reduced basis functions.
Given samples from two non-negative random variables, we propose a new class of nonparametric tests for the null hypothesis that one random variable dominates the other with respect to second-order stochastic dominance. These tests are based on the Lorenz P-P plot (LPP), which is the composition between the inverse unscaled Lorenz curve of one distribution and the unscaled Lorenz curve of the other. The LPP exceeds the identity function if and only if the dominance condition is violated, providing a rather simple method to construct test statistics, given by functionals defined over the difference between the identity and the LPP. We determine a stochastic upper bound for such test statistics under the null hypothesis, and derive its limit distribution, to be approximated via bootstrap procedures. We also establish the asymptotic validity of the tests under relatively mild conditions, allowing for both dependent and independent samples. Finally, finite sample properties are investigated through simulation studies.
Understanding geometric properties of natural language processing models' latent spaces allows the manipulation of these properties for improved performance on downstream tasks. One such property is the amount of data spread in a model's latent space, or how fully the available latent space is being used. In this work, we define data spread and demonstrate that the commonly used measures of data spread, Average Cosine Similarity and a partition function min/max ratio I(V), do not provide reliable metrics to compare the use of latent space across models. We propose and examine eight alternative measures of data spread, all but one of which improve over these current metrics when applied to seven synthetic data distributions. Of our proposed measures, we recommend one principal component-based measure and one entropy-based measure that provide reliable, relative measures of spread and can be used to compare models of different sizes and dimensionalities.
In this article, we will construct an overconstrained closed-loop linkage consisting of four revolute and one cylindrical joint. It is obtained by factorization of a prescribed vertical Darboux motion. We will investigate the kinematic behaviour of the obtained mechanism, which turns out to have multiple operation modes. Under certain conditions on the design parameters, two of the operation modes will correspond to vertical Darboux motions. It turns out, that for these design parameters, there also exists a second assembly mode.
We study parametric inference for hypo-elliptic Stochastic Differential Equations (SDEs). Existing research focuses on a particular class of hypo-elliptic SDEs, with components split into `rough'/`smooth' and noise from rough components propagating directly onto smooth ones, but some critical model classes arising in applications have yet to be explored. We aim to cover this gap, thus analyse the highly degenerate class of SDEs, where components split into further sub-groups. Such models include e.g.~the notable case of generalised Langevin equations. We propose a tailored time-discretisation scheme and provide asymptotic results supporting our scheme in the context of high-frequency, full observations. The proposed discretisation scheme is applicable in much more general data regimes and is shown to overcome biases via simulation studies also in the practical case when only a smooth component is observed. Joint consideration of our study for highly degenerate SDEs and existing research provides a general `recipe' for the development of time-discretisation schemes to be used within statistical methods for general classes of hypo-elliptic SDEs.
In this paper, we consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints that require the minimizers across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus or single-task optimization as special cases, and allows for more general task relatedness models such as multitask smoothness and coupled optimization. In order to cope with communication constraints, we propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates before communicating with their neighbors. The analysis shows that, under some general conditions on the quantization noise, and for sufficiently small step-sizes $\mu$, the strategy is stable both in terms of mean-square error and average bit rate: by reducing $\mu$, it is possible to keep the estimation errors small (on the order of $\mu$) without increasing indefinitely the bit rate as $\mu\rightarrow 0$. Simulations illustrate the theoretical findings and the effectiveness of the proposed approach, revealing that decentralized learning is achievable at the expense of only a few bits.
We propose an unconditionally energy-stable, orthonormality-preserving, component-wise splitting iterative scheme for the Kohn-Sham gradient flow based model in the electronic structure calculation. We first study the scheme discretized in time but still continuous in space. The component-wise splitting iterative scheme changes one wave function at a time, similar to the Gauss-Seidel iteration for solving a linear equation system. Rigorous mathematical derivations are presented to show our proposed scheme indeed satisfies the desired properties. We then study the fully-discretized scheme, where the space is further approximated by a conforming finite element subspace. For the fully-discretized scheme, not only the preservation of orthogonality and normalization (together we called orthonormalization) can be quickly shown using the same idea as for the semi-discretized scheme, but also the highlight property of the scheme, i.e., the unconditional energy stability can be rigorously proven. The scheme allows us to use large time step sizes and deal with small systems involving only a single wave function during each iteration step. Several numerical experiments are performed to verify the theoretical analysis, where the number of iterations is indeed greatly reduced as compared to similar examples solved by the Kohn-Sham gradient flow based model in the literature.
We introduce an information-theoretic quantity with similar properties to mutual information that can be estimated from data without making explicit assumptions on the underlying distribution. This quantity is based on a recently proposed matrix-based entropy that uses the eigenvalues of a normalized Gram matrix to compute an estimate of the eigenvalues of an uncentered covariance operator in a reproducing kernel Hilbert space. We show that a difference of matrix-based entropies (DiME) is well suited for problems involving the maximization of mutual information between random variables. While many methods for such tasks can lead to trivial solutions, DiME naturally penalizes such outcomes. We compare DiME to several baseline estimators of mutual information on a toy Gaussian dataset. We provide examples of use cases for DiME, such as latent factor disentanglement and a multiview representation learning problem where DiME is used to learn a shared representation among views with high mutual information.