We introduce an information-theoretic quantity with similar properties to mutual information that can be estimated from data without making explicit assumptions on the underlying distribution. This quantity is based on a recently proposed matrix-based entropy that uses the eigenvalues of a normalized Gram matrix to compute an estimate of the eigenvalues of an uncentered covariance operator in a reproducing kernel Hilbert space. We show that a difference of matrix-based entropies (DiME) is well suited for problems involving the maximization of mutual information between random variables. While many methods for such tasks can lead to trivial solutions, DiME naturally penalizes such outcomes. We compare DiME to several baseline estimators of mutual information on a toy Gaussian dataset. We provide examples of use cases for DiME, such as latent factor disentanglement and a multiview representation learning problem where DiME is used to learn a shared representation among views with high mutual information.
In recent years, word embeddings have been widely used to measure biases in texts. Even if they have proven to be effective in detecting a wide variety of biases, metrics based on word embeddings lack transparency and interpretability. We analyze an alternative PMI-based metric to quantify biases in texts. It can be expressed as a function of conditional probabilities, which provides a simple interpretation in terms of word co-occurrences. We also prove that it can be approximated by an odds ratio, which allows estimating confidence intervals and statistical significance of textual biases. This approach produces similar results to metrics based on word embeddings when capturing gender gaps of the real world embedded in large corpora.
Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schroedinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.
We give systematic ways of defining monotone quantum relative entropies and (multi-variate) quantum R\'enyi divergences starting from a set of monotone quantum relative entropies. Despite its central importance in information theory, only two additive and monotone quantum extensions of the classical relative entropy have been known so far, the Umegaki and the Belavkin-Staszewski relative entropies. Here we give a general procedure to construct monotone and additive quantum relative entropies from a given one with the same properties; in particular, when starting from the Umegaki relative entropy, this gives a new one-parameter family of monotone and additive quantum relative entropies interpolating between the Umegaki and the Belavkin-Staszewski ones on full-rank states. In a different direction, we use a generalization of a classical variational formula to define multi-variate quantum R\'enyi quantities corresponding to any finite set of quantum relative entropies $(D^{q_x})_{x\in X}$ and signed probability measure $P$, as $$ Q_P^{b,q}((\rho_x)_{x\in X}):=\sup_{\tau\ge 0}\left\{\Tr\tau-\sum_xP(x)D^{q_x}(\tau\|\rho_x)\right\}. $$ We show that monotone quantum relative entropies define monotone R\'enyi quantities whenever $P$ is a probability measure. With the proper normalization, the negative logarithm of the above quantity gives a quantum extension of the classical R\'enyi $\alpha$-divergence in the 2-variable case ($X=\{0,1\}$, $P(0)=\alpha$). We show that if both $D^{q_0}$ and $D^{q_1}$ are monotone and additive quantum relative entropies, and at least one of them is strictly larger than the Umegaki relative entropy then the resulting barycentric R\'enyi divergences are strictly between the log-Euclidean and the maximal R\'enyi divergences, and hence they are different from any previously studied quantum R\'enyi divergence.
We prove a variety of new and refined uniform continuity bounds for entropies of both classical random variables on an infinite state space and of quantum states of infinite-dimensional systems. We obtain the first tight continuity estimate on the Shannon entropy of random variables with a countably infinite alphabet. The proof relies on a new mean-constrained Fano-type inequality and the notion of maximal coupling of random variables. We then employ this classical result to derive the first tight energy-constrained continuity bound for the von Neumann entropy of states of infinite-dimensional quantum systems, when the Hamiltonian is the number operator, which is arguably the most relevant Hamiltonian in the study of infinite-dimensional quantum systems in the context of quantum information theory. The above scheme works only for Shannon- and von Neumann entropies. Hence, to deal with more general entropies, e.g. $\alpha$-R\'enyi and $\alpha$-Tsallis entropies, with $\alpha \in (0,1)$, for which continuity bounds are known only for finite-dimensional systems, we develop a novel approximation scheme which relies on recent results on operator H\"older continuous functions and the equivalence of all Schatten norms in special spectral subspaces of the Hamiltonian. This approach is, as we show, motivated by continuity bounds for $\alpha$-R\'enyi and $\alpha$-Tsallis entropies of random variables that follow from the H\"older continuity of the entropy functionals. Bounds for $\alpha>1$ are provided, too. Finally, we settle an open problem on related approximation questions posed in the recent works by Shirokov on the so-called Finite-dimensional Approximation (FA) property.
Variational inference (VI) is a method to approximate the computationally intractable posterior distributions that arise in Bayesian statistics. Typically, VI fits a simple parametric distribution to the target posterior by minimizing an appropriate objective such as the evidence lower bound (ELBO). In this work, we present a new approach to VI based on the principle of score matching, that if two distributions are equal then their score functions (i.e., gradients of the log density) are equal at every point on their support. With this, we develop score matching VI, an iterative algorithm that seeks to match the scores between the variational approximation and the exact posterior. At each iteration, score matching VI solves an inner optimization, one that minimally adjusts the current variational estimate to match the scores at a newly sampled value of the latent variables. We show that when the variational family is a Gaussian, this inner optimization enjoys a closed form solution, which we call Gaussian score matching VI (GSM-VI). GSM-VI is also a ``black box'' variational algorithm in that it only requires a differentiable joint distribution, and as such it can be applied to a wide class of models. We compare GSM-VI to black box variational inference (BBVI), which has similar requirements but instead optimizes the ELBO. We study how GSM-VI behaves as a function of the problem dimensionality, the condition number of the target covariance matrix (when the target is Gaussian), and the degree of mismatch between the approximating and exact posterior distribution. We also study GSM-VI on a collection of real-world Bayesian inference problems from the posteriorDB database of datasets and models. In all of our studies we find that GSM-VI is faster than BBVI, but without sacrificing accuracy. It requires 10-100x fewer gradient evaluations to obtain a comparable quality of approximation.
In this paper we derive tight lower bounds resolving the hardness status of several fundamental weighted matroid problems. One notable example is budgeted matroid independent set, for which we show there is no fully polynomial-time approximation scheme (FPTAS), indicating the Efficient PTAS of [Doron-Arad, Kulik and Shachnai, SOSA 2023] is the best possible. Furthermore, we show that there is no pseudo-polynomial time algorithm for exact weight matroid independent set, implying the algorithm of [Camerini, Galbiati and Maffioli, J. Algorithms 1992] for representable matroids cannot be generalized to arbitrary matroids. Similarly, we show there is no Fully PTAS for constrained minimum basis of a matroid and knapsack cover with a matroid, implying the existing Efficient PTAS for the former is optimal. For all of the above problems, we obtain unconditional lower bounds in the oracle model, where the independent sets of the matroid can be accessed only via a membership oracle. We complement these results by showing that the same lower bounds hold under standard complexity assumptions, even if the matroid is encoded as part of the instance. All of our bounds are based on a specifically structured family of paving matroids.
In this paper, we extend the Generalized Finite Difference Method (GFDM) on unknown compact submanifolds of the Euclidean domain, identified by randomly sampled data that (almost surely) lie on the interior of the manifolds. Theoretically, we formalize GFDM by exploiting a representation of smooth functions on the manifolds with Taylor's expansions of polynomials defined on the tangent bundles. We illustrate the approach by approximating the Laplace-Beltrami operator, where a stable approximation is achieved by a combination of Generalized Moving Least-Squares algorithm and novel linear programming that relaxes the diagonal-dominant constraint for the estimator to allow for a feasible solution even when higher-order polynomials are employed. We establish the theoretical convergence of GFDM in solving Poisson PDEs and numerically demonstrate the accuracy on simple smooth manifolds of low and moderate high co-dimensions as well as unknown 2D surfaces. For the Dirichlet Poisson problem where no data points on the boundaries are available, we employ GFDM with the volume-constraint approach that imposes the boundary conditions on data points close to the boundary. When the location of the boundary is unknown, we introduce a novel technique to detect points close to the boundary without needing to estimate the distance of the sampled data points to the boundary. We demonstrate the effectiveness of the volume-constraint employed by imposing the boundary conditions on the data points detected by this new technique compared to imposing the boundary conditions on all points within a certain distance from the boundary, where the latter is sensitive to the choice of truncation distance and require the knowledge of the boundary location.
We present a combination technique based on mixed differences of both spatial approximations and quadrature formulae for the stochastic variables to solve efficiently a class of Optimal Control Problems (OCPs) constrained by random partial differential equations. The method requires to solve the OCP for several low-fidelity spatial grids and quadrature formulae for the objective functional. All the computed solutions are then linearly combined to get a final approximation which, under suitable regularity assumptions, preserves the same accuracy of fine tensor product approximations, while drastically reducing the computational cost. The combination technique involves only tensor product quadrature formulae, thus the discretized OCPs preserve the convexity of the continuous OCP. Hence, the combination technique avoids the inconveniences of Multilevel Monte Carlo and/or sparse grids approaches, but remains suitable for high dimensional problems. The manuscript presents an a-priori procedure to choose the most important mixed differences and an asymptotic complexity analysis, which states that the asymptotic complexity is exclusively determined by the spatial solver. Numerical experiments validate the results.
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.