We consider here the discrete time dynamics described by a transformation $T:M \to M$, where $T$ is either the action of shift $T=\sigma$ on the symbolic space $M=\{1,2,...,d\}^\mathbb{N}$, or, $T$ describes the action of a $d$ to $1$ expanding transformation $T:S^1 \to S^1$ of class $C^{1+\alpha}$ (\,for example $x \to T(x) =d\, x $ (mod $1) $\,), where $M=S^1$ is the unit circle. It is known that the infinite-dimensional manifold $\mathcal{N}$ of equilibrium probabilities for H\"older potentials $A:M \to \mathbb{R}$ is an analytical manifold and carries a natural Riemannian metric associated with the asymptotic variance. We show here that under the assumption of the existence of a Fourier-like Hilbert basis for the kernel of the Ruelle operator there exists geodesics paths. When $T=\sigma$ and $M=\{0,1\}^\mathbb{N}$ such basis exists. In a different direction, we also consider the KL-divergence $D_{KL}(\mu_1,\mu_2)$ for a pair of equilibrium probabilities. If $D_{KL}(\mu_1,\mu_2)=0$, then $\mu_1=\mu_2$. Although $D_{KL}$ is not a metric in $\mathcal{N}$, it describes the proximity between $\mu_1$ and $\mu_2$. A natural problem is: for a fixed probability $\mu_1\in \mathcal{N}$ consider the probability $\mu_2$ in a convex set of probabilities in $\mathcal{N}$ which minimizes $D_{KL}(\mu_1,\mu_2)$. This minimization problem is a dynamical version of the main issues considered in information projections. We consider this problem in $\mathcal{N}$, a case where all probabilities are dynamically invariant, getting explicit equations for the solution sought. Triangle and Pythagorean inequalities will be investigated.
We describe a `discretize-then-relax' strategy to globally minimize integral functionals over functions $u$ in a Sobolev space satisfying prescribed Dirichlet boundary conditions. The strategy applies whenever the integral functional depends polynomially on $u$ and its derivatives, even if it is nonconvex. The `discretize' step uses a bounded finite-element scheme to approximate the integral minimization problem with a convergent hierarchy of polynomial optimization problems over a compact feasible set, indexed by the decreasing size $h$ of the finite-element mesh. The `relax' step employs sparse moment-SOS relaxations to approximate each polynomial optimization problem with a hierarchy of convex semidefinite programs, indexed by an increasing relaxation order $\omega$. We prove that, as $\omega\to\infty$ and $h\to 0$, solutions of such semidefinite programs provide approximate minimizers that converge in $L^p$ to the global minimizer of the original integral functional if this is unique. We also report computational experiments that show our numerical strategy works well even when technical conditions required by our theoretical analysis are not satisfied.
Recently established, directed dependence measures for pairs $(X,Y)$ of random variables build upon the natural idea of comparing the conditional distributions of $Y$ given $X=x$ with the marginal distribution of $Y$. They assign pairs $(X,Y)$ values in $[0,1]$, the value is $0$ if and only if $X,Y$ are independent, and it is $1$ exclusively for $Y$ being a function of $X$. Here we show that comparing randomly drawn conditional distributions with each other instead or, equivalently, analyzing how sensitive the conditional distribution of $Y$ given $X=x$ is on $x$, opens the door to constructing novel families of dependence measures $\Lambda_\varphi$ induced by general convex functions $\varphi: \mathbb{R} \rightarrow \mathbb{R}$, containing, e.g., Chatterjee's coefficient of correlation as special case. After establishing additional useful properties of $\Lambda_\varphi$ we focus on continuous $(X,Y)$, translate $\Lambda_\varphi$ to the copula setting, consider the $L^p$-version and establish an estimator which is strongly consistent in full generality. A real data example and a simulation study illustrate the chosen approach and the performance of the estimator. Complementing the afore-mentioned results, we show how a slight modification of the construction underlying $\Lambda_\varphi$ can be used to define new measures of explainability generalizing the fraction of explained variance.
Blumer et al. (1987, 1989) showed that any concept class that is learnable by Occam algorithms is PAC learnable. Board and Pitt (1990) showed a partial converse of this theorem: for concept classes that are closed under exception lists, any class that is PAC learnable is learnable by an Occam algorithm. However, their Occam algorithm outputs a hypothesis whose complexity is $\delta$-dependent, which is an important limitation. In this paper, we show that their partial converse applies to Occam algorithms with $\delta$-independent complexities as well. Thus, we provide a posteriori justification of various theoretical results and algorithm design methods which use the partial converse as a basis for their work.
We study the problem of reconstructing the Faber--Schauder coefficients of a continuous function $f$ from discrete observations of its antiderivative $F$. Our approach starts with formulating this problem through piecewise quadratic spline interpolation. We then provide a closed-form solution and an in-depth error analysis. These results lead to some surprising observations, which also throw new light on the classical topic of quadratic spline interpolation itself: They show that the well-known instabilities of this method can be located exclusively within the final generation of estimated Faber--Schauder coefficients, which suffer from non-locality and strong dependence on the initial value and the given data. By contrast, all other Faber--Schauder coefficients depend only locally on the data, are independent of the initial value, and admit uniform error bounds. We thus conclude that a robust and well-behaved estimator for our problem can be obtained by simply dropping the final-generation coefficients from the estimated Faber--Schauder coefficients.
L. Klebanov proved the following theorem. Let $\xi_1, \dots, \xi_n$ be independent random variables. Consider linear forms $L_1=a_1\xi_1+\cdots+a_n\xi_n,$ $L_2=b_1\xi_1+\cdots+b_n\xi_n,$ $L_3=c_1\xi_1+\cdots+c_n\xi_n,$ $L_4=d_1\xi_1+\cdots+d_n\xi_n,$ where the coefficients $a_j, b_j, c_j, d_j$ are real numbers. If the random vectors $(L_1,L_2)$ and $(L_3,L_4)$ are identically distributed, then all $\xi_i$ for which $a_id_j-b_ic_j\neq 0$ for all $j=\overline{1,n}$ are Gaussian random variables. The present article is devoted to an analog of the Klebanov theorem in the case when random variables take values in a locally compact Abelian group and the coefficients of the linear forms are integers.
By combining a logarithm transformation with a corrected Milstein-type method, the present article proposes an explicit, unconditional boundary and dynamics preserving scheme for the stochastic susceptible-infected-susceptible (SIS) epidemic model that takes value in (0,N). The scheme applied to the model is first proved to have a strong convergence rate of order one. Further, the dynamic behaviors are analyzed for the numerical approximations and it is shown that the scheme can unconditionally preserve both the domain and the dynamics of the model. More precisely, the proposed scheme gives numerical approximations living in the domain (0,N) and reproducing the extinction and persistence properties of the original model for any time discretization step-size h > 0, without any additional requirements on the model parameters. Numerical experiments are presented to verify our theoretical results.
Neural networks have revolutionized the field of machine learning with increased predictive capability. In addition to improving the predictions of neural networks, there is a simultaneous demand for reliable uncertainty quantification on estimates made by machine learning methods such as neural networks. Bayesian neural networks (BNNs) are an important type of neural network with built-in capability for quantifying uncertainty. This paper discusses aleatoric and epistemic uncertainty in BNNs and how they can be calculated. With an example dataset of images where the goal is to identify the amplitude of an event in the image, it is shown that epistemic uncertainty tends to be lower in images which are well-represented in the training dataset and tends to be high in images which are not well-represented. An algorithm for out-of-distribution (OoD) detection with BNN epistemic uncertainty is introduced along with various experiments demonstrating factors influencing the OoD detection capability in a BNN. The OoD detection capability with epistemic uncertainty is shown to be comparable to the OoD detection in the discriminator network of a generative adversarial network (GAN) with comparable network architecture.
This work puts forth low-complexity Riemannian subspace descent algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold. Different from the existing Riemannian gradient descent variants, the proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix. The resulting updates avoid the costly matrix operations like matrix exponentiation and dense matrix multiplication, which are generally required in almost all other Riemannian optimization algorithms on SPD manifold. We further identify a broad class of functions, arising in diverse applications, such as kernel matrix learning, covariance estimation of Gaussian distributions, maximum likelihood parameter estimation of elliptically contoured distributions, and parameter estimation in Gaussian mixture model problems, over which the Riemannian gradients can be calculated efficiently. The proposed uni-directional and multi-directional Riemannian subspace descent variants incur per-iteration complexities of $\mathcal{O}(n)$ and $\mathcal{O}(n^2)$ respectively, as compared to the $\mathcal{O}(n^3)$ or higher complexity incurred by all existing Riemannian gradient descent variants. The superior runtime and low per-iteration complexity of the proposed algorithms is also demonstrated via numerical tests on large-scale covariance estimation problems.
A new information theoretic condition is presented for reconstructing a discrete random variable $X$ based on the knowledge of a set of discrete functions of $X$. The reconstruction condition is derived from Shannon's 1953 lattice theory with two entropic metrics of Shannon and Rajski. Because such a theoretical material is relatively unknown and appears quite dispersed in different references, we first provide a synthetic description (with complete proofs) of its concepts, such as total, common and complementary informations. Definitions and properties of the two entropic metrics are also fully detailed and shown compatible with the lattice structure. A new geometric interpretation of such a lattice structure is then investigated that leads to a necessary (and sometimes sufficient) condition for reconstructing the discrete random variable $X$ given a set $\{ X_1,\ldots,X_{n} \}$ of elements in the lattice generated by $X$. Finally, this condition is illustrated in five specific examples of perfect reconstruction problems: reconstruction of a symmetric random variable from the knowledge of its sign and absolute value, reconstruction of a word from a set of linear combinations, reconstruction of an integer from its prime signature (fundamental theorem of arithmetic) and from its remainders modulo a set of coprime integers (Chinese remainder theorem), and reconstruction of the sorting permutation of a list from a minimal set of pairwise comparisons.
We revisit the relation between the gradient-flow equations and Hamilton's equations in information geometry. By regarding the gradient-flow equations as Huygens' equations in geometric optics, we have related the gradient flows to the geodesic flows induced by the geodesic Hamiltonian in an appropriate Riemannian geometry. The original evolution parameter $\textit{t}$ in the gradient-flow equations is related to the arc-length parameter in the associated Riemannian manifold by Jacobi-Maupertuis transformation. As a by-product, it is found the relation between the gradient-flow equation and replicator equations.