We introduce a notion called entropic independence that is an entropic analog of spectral notions of high-dimensional expansion. Informally, entropic independence of a background distribution $\mu$ on $k$-sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $S$, the relative entropy of a single element of $S$ drawn uniformly at random carries at most $O(1/k)$ fraction of the relative entropy of $S$. Entropic independence is the analog of the notion of spectral independence, if one replaces variance by entropy. We use entropic independence to derive tight mixing time bounds, overcoming the lossy nature of spectral analysis of Markov chains on exponential-sized state spaces. In our main technical result, we show a general way of deriving entropy contraction, a.k.a. modified log-Sobolev inequalities, for down-up random walks from spectral notions. We show that spectral independence of a distribution under arbitrary external fields automatically implies entropic independence. To derive our results, we relate entropic independence to properties of polynomials: $\mu$ is entropically independent exactly when a transformed version of the generating polynomial of $\mu$ is upper bounded by its linear tangent; this property is implied by concavity of the said transformation, which was shown by prior work to be locally equivalent to spectral independence. We apply our results to obtain tight modified log-Sobolev inequalities and mixing times for multi-step down-up walks on fractionally log-concave distributions. As our flagship application, we establish the tight mixing time of $O(n\log n)$ for Glauber dynamics on Ising models whose interaction matrix has eigenspectrum lying within an interval of length smaller than $1$, improving upon the prior quadratic dependence on $n$.
Most of the popular dependence measures for two random variables $X$ and $Y$ (such as Pearson's and Spearman's correlation, Kendall's $\tau$ and Gini's $\gamma$) vanish whenever $X$ and $Y$ are independent. However, neither does a vanishing dependence measure necessarily imply independence, nor does a measure equal to 1 imply that one variable is a measurable function of the other. Yet, both properties are natural desiderata for a convincing dependence measure. In this paper, we present a general approach to transforming a given dependence measure into a new one which exactly characterizes independence as well as functional dependence. Our approach uses the concept of monotone rearrangements as introduced by Hardy and Littlewood and is applicable to a broad class of measures. In particular, we are able to define a rearranged Spearman's $\rho$ and a rearranged Kendall's $\tau$ which do attain the value $1$ if, and only if, one variable is a measurable function of the other. We also present simple estimators for the rearranged dependence measures, prove their consistency and illustrate their finite sample properties by means of a simulation study.
We study the classical expander codes, introduced by Sipser and Spielman \cite{SS96}. Given any constants $0< \alpha, \varepsilon < 1/2$, and an arbitrary bipartite graph with $N$ vertices on the left, $M < N$ vertices on the right, and left degree $D$ such that any left subset $S$ of size at most $\alpha N$ has at least $(1-\varepsilon)|S|D$ neighbors, we show that the corresponding linear code given by parity checks on the right has distance at least roughly $\frac{\alpha N}{2 \varepsilon }$. This is strictly better than the best known previous result of $2(1-\varepsilon ) \alpha N$ \cite{Sudan2000note, Viderman13b} whenever $\varepsilon < 1/2$, and improves the previous result significantly when $\varepsilon $ is small. Furthermore, we show that this distance is tight in general, thus providing a complete characterization of the distance of general expander codes. Next, we provide several efficient decoding algorithms, which vastly improve previous results in terms of the fraction of errors corrected, whenever $\varepsilon < \frac{1}{4}$. Finally, we also give a bound on the list-decoding radius of general expander codes, which beats the classical Johnson bound in certain situations (e.g., when the graph is almost regular and the code has a high rate). Our techniques exploit novel combinatorial properties of bipartite expander graphs. In particular, we establish a new size-expansion tradeoff, which may be of independent interests.
We examine a family of discrete probability distributions that describes the "spillage number" in the extended balls-in-bins model. The spillage number is defined as the number of balls that occupy their bins minus the total number of occupied bins. This probability distribution can be characterised as a normed version of the expansion of the noncentral Stirling numbers of the second kind in terms of the central Stirling numbers of the second kind. Alternatively it can be derived in a natural way from the extended balls-in-bins model. We derive the generating functions for this distribution and important moments of the distribution. We also derive an algorithm for recursive computation of the mass values for the distribution. Finally, we examine the asymptotic behaviour of the spillage distribution and the performance of an approximation to the distribution.
Domain decomposition methods are among the most efficient for solving sparse linear systems of equations. Their effectiveness relies on a judiciously chosen coarse space. Originally introduced and theoretically proved to be efficient for self-adjoint operators, spectral coarse spaces have been proposed in the past few years for indefinite and non-self-adjoint operators. This paper presents a new spectral coarse space that can be constructed in a fully-algebraic way unlike most existing spectral coarse spaces. We present theoretical convergence result for Hermitian positive definite diagonally dominant matrices. Numerical experiments and comparisons against state-of-the-art preconditioners in the multigrid community show that the resulting two-level Schwarz preconditioner is efficient especially for non-self-adjoint operators. Furthermore, in this case, our proposed preconditioner outperforms state-of-the-art preconditioners.
In this paper, we investigate local permutation tests for testing conditional independence between two random vectors $X$ and $Y$ given $Z$. The local permutation test determines the significance of a test statistic by locally shuffling samples which share similar values of the conditioning variables $Z$, and it forms a natural extension of the usual permutation approach for unconditional independence testing. Despite its simplicity and empirical support, the theoretical underpinnings of the local permutation test remain unclear. Motivated by this gap, this paper aims to establish theoretical foundations of local permutation tests with a particular focus on binning-based statistics. We start by revisiting the hardness of conditional independence testing and provide an upper bound for the power of any valid conditional independence test, which holds when the probability of observing collisions in $Z$ is small. This negative result naturally motivates us to impose additional restrictions on the possible distributions under the null and alternate. To this end, we focus our attention on certain classes of smooth distributions and identify provably tight conditions under which the local permutation method is universally valid, i.e. it is valid when applied to any (binning-based) test statistic. To complement this result on type I error control, we also show that in some cases, a binning-based statistic calibrated via the local permutation method can achieve minimax optimal power. We also introduce a double-binning permutation strategy, which yields a valid test over less smooth null distributions than the typical single-binning method without compromising much power. Finally, we present simulation results to support our theoretical findings.
We present new scalar and matrix Chernoff-style concentration bounds for a broad class of probability distributions over the binary hypercube $\{0,1\}^n$. Motivated by recent tools developed for the study of mixing times of Markov chains on discrete distributions, we say that a distribution is $\ell_\infty$-independent when the infinity norm of its influence matrix $\mathcal{I}$ is bounded by a constant. We show that any distribution which is $\ell_\infty$-independent satisfies a matrix Chernoff bound that matches the matrix Chernoff bound for independent random variables due to Tropp. Our matrix Chernoff bound is a broad generalization and strengthening of the matrix Chernoff bound of Kyng and Song (FOCS'18). Using our bound, we can conclude as a corollary that a union of $O(\log|V|)$ random spanning trees gives a spectral graph sparsifier of a graph with $|V|$ vertices with high probability, matching results for independent edge sampling, and matching lower bounds from Kyng and Song.
In this paper, we show that the diagonal of a high-dimensional sample covariance matrix stemming from $n$ independent observations of a $p$-dimensional time series with finite fourth moments can be approximated in spectral norm by the diagonal of the population covariance matrix. We assume that $n,p\to \infty$ with $p/n$ tending to a constant which might be positive or zero. As applications, we provide an approximation of the sample correlation matrix ${\mathbf R}$ and derive a variety of results for its eigenvalues. We identify the limiting spectral distribution of ${\mathbf R}$ and construct an estimator for the population correlation matrix and its eigenvalues. Finally, the almost sure limits of the extreme eigenvalues of ${\mathbf R}$ in a generalized spiked correlation model are analyzed.
This paper introduces a new approach to inferring the second order properties of a multivariate log Gaussian Cox process (LGCP) with a complex intensity function. We assume a semi-parametric model for the multivariate intensity function containing an unspecified complex factor common to all types of points. Given this model we exploit the availability of several types of points to construct a second-order conditional composite likelihood to infer the pair correlation and cross pair correlation functions of the LGCP. Crucially this likelihood does not depend on the unspecified part of the intensity function. We also introduce a cross validation method for model selection and an algorithm for regularized inference that can be used to obtain sparse models for cross pair correlation functions. The methodology is applied to simulated data as well as data examples from microscopy and criminology. This shows how the new approach outperforms existing alternatives where the intensity functions are estimated non-parametrically.
We deal with Bayesian generative and discriminative classifiers. Given a model distribution $p(x, y)$, with the observation $y$ and the target $x$, one computes generative classifiers by firstly considering $p(x, y)$ and then using the Bayes rule to calculate $p(x | y)$. A discriminative model is directly given by $p(x | y)$, which is used to compute discriminative classifiers. However, recent works showed that the Bayesian Maximum Posterior classifier defined from the Naive Bayes (NB) or Hidden Markov Chain (HMC), both generative models, can also match the discriminative classifier definition. Thus, there are situations in which dividing classifiers into "generative" and "discriminative" is somewhat misleading. Indeed, such a distinction is rather related to the way of computing classifiers, not to the classifiers themselves. We present a general theoretical result specifying how a generative classifier induced from a generative model can also be computed in a discriminative way from the same model. Examples of NB and HMC are found again as particular cases, and we apply the general result to two original extensions of NB, and two extensions of HMC, one of which being original. Finally, we shortly illustrate the interest of the new discriminative way of computing classifiers in the Natural Language Processing (NLP) framework.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.