Nonparametric latent structure models provide flexible inference on distinct, yet related, groups of observations. Each component of a vector of $d \ge 2$ random measures models the distribution of a group of exchangeable observations, while their dependence structure regulates the borrowing of information across different groups. Recent work has quantified the dependence between random measures in terms of Wasserstein distance from the maximally dependent scenario when $d=2$. By solving an intriguing max-min problem we are now able to define a Wasserstein index of dependence $I_\mathcal{W}$ with the following properties: (i) it simultaneously quantifies the dependence of $d \ge 2$ random measures; (ii) it takes values in [0,1]; (iii) it attains the extreme values $\{0,1\}$ under independence and complete dependence, respectively; (iv) since it is defined in terms of the underlying L\'evy measures, it is possible to evaluate it numerically in many Bayesian nonparametric models for partially exchangeable data.
It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or chi-square-divergence. Some of these inequalities rely on a new concept of information matrices. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we combine the general strategy for lower bounds with a reduction technique. This allows us to link the original problem to the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. In the Gaussian sequence model, different phase transitions of the bias-variance trade-off occur. Although there is a non-trivial interplay between bias and variance, the rate of the squared bias and the variance do not have to be balanced in order to achieve the minimax estimation rate.
We expand our effective framework for weak convergence of measures on the real line by showing that effective convergence in the Prokhorov metric is equivalent to effective weak convergence. In addition, we establish a framework for the study of the effective theory of vague convergence of measures. We introduce a uniform notion and a non-uniform notion of vague convergence, and we show that both these notions are equivalent. However, limits under effective vague convergence may not be computable even when they are finite. We give an example of a finite incomputable effective vague limit measure, and we provide a necessary and sufficient condition so that effective vague convergence produces a computable limit. Finally, we determine a sufficient condition for which effective weak and vague convergence of measures coincide. As a corollary, we obtain an effective version of the equivalence between classical weak and vague convergence of sequences of probability measures.
Let $\pi\in \Pi(\mu,\nu)$ be a coupling between two probability measures $\mu$ and $\nu$ on a Polish space. In this article we propose and study a class of nonparametric measures of association between $\mu$ and $\nu$, which we call Wasserstein correlation coefficients. These coefficients are based on the Wasserstein distance between $\nu$ and the disintegration $\pi_{x_1}$ of $\pi$ with respect to the first coordinate. We also establish basic statistical properties of this new class of measures: we develop a statistical theory for strongly consistent estimators and determine their convergence rate in the case of compactly supported measures $\mu$ and $\nu$. Throughout our analysis we make use of the so-called adapted/bicausal Wasserstein distance, in particular we rely on results established in [Backhoff, Bartl, Beiglb\"ock, Wiesel. Estimating processes in adapted Wasserstein distance. 2020]. Our approach applies to probability laws on general Polish spaces.
Recovery of support of a sparse vector from simple measurements is a widely-studied problem, considered under the frameworks of compressed sensing, 1-bit compressed sensing, and more general single index models. We consider generalizations of this problem: mixtures of linear regressions, and mixtures of linear classifiers, where the goal is to recover supports of multiple sparse vectors using only a small number of possibly noisy linear, and 1-bit measurements respectively. The key challenge is that the measurements from different vectors are randomly mixed. Both of these problems have also received attention recently. In mixtures of linear classifiers, the observations correspond to the side of queried hyperplane a random unknown vector lies in, whereas in mixtures of linear regressions we observe the projection of a random unknown vector on the queried hyperplane. The primary step in recovering the unknown vectors from the mixture is to first identify the support of all the individual component vectors. In this work, we study the number of measurements sufficient for recovering the supports of all the component vectors in a mixture in both these models. We provide algorithms that use a number of measurements polynomial in $k, \log n$ and quasi-polynomial in $\ell$, to recover the support of all the $\ell$ unknown vectors in the mixture with high probability when each individual component is a $k$-sparse $n$-dimensional vector.
We propose an approach to the estimation of infinite sets of random vectors. The problem addressed is as follows. Given two infinite sets of random vectors, find a single estimator that estimates vectors from with a controlled associated error. A new theory for the existence and implementation of such an estimator is studied. In particular, we show that the proposed estimator is asymptotically optimal. Moreover, the estimator is determined in terms of pseudo-inverse matrices and, therefore, it always exists.
In this paper, we consider the problem of computing the entire sequence of the maximum degree of minors of a block-structured symbolic matrix (a generic partitioned polynomial matrix) $A = (A_{\alpha\beta} x_{\alpha \beta} t^{d_{\alpha \beta}})$, where $A_{\alpha\beta}$ is a $2 \times 2$ matrix over a field $\mathbf{F}$, $x_{\alpha \beta}$ is an indeterminate, and $d_{\alpha \beta}$ is an integer for $\alpha = 1,2,\dots, \mu$ and $\beta = 1,2,\dots,\nu$, and $t$ is an additional indeterminate. This problem can be viewed as an algebraic generalization of the maximum weight bipartite matching problem. The main result of this paper is a combinatorial $O(\mu \nu \min\{\mu, \nu\}^2)$-time algorithm for computing the entire sequence of the maximum degree of minors of a $(2 \times 2)$-type generic partitioned polynomial matrix of size $2\mu \times 2\nu$. We also present a minimax theorem, which can be used as a good characterization (NP $\cap$ co-NP characterization) for the computation of the maximum degree of minors of order $k$. Our results generalize the classical primal-dual algorithm (the Hungarian method) and minimax formula (Egerv\'ary's theorem) for the maximum weight bipartite matching problem.
Penalized likelihood models are widely used to simultaneously select variables and estimate model parameters. However, the existence of weak signals can lead to inaccurate variable selection, biased parameter estimation, and invalid inference. Thus, identifying weak signals accurately and making valid inferences are crucial in penalized likelihood models. We develop a unified approach to identify weak signals and make inferences in penalized likelihood models, including the special case when the responses are categorical. To identify weak signals, we use the estimated selection probability of each covariate as a measure of the signal strength and formulate a signal identification criterion. To construct confidence intervals, we propose a two-step inference procedure. Extensive simulation studies show that the proposed procedure outperforms several existing methods. We illustrate the proposed method by applying it to the Practice Fusion diabetes data set.
We introduce a flexible and scalable class of Bayesian geostatistical models for discrete data, based on the class of nearest neighbor mixture transition distribution processes (NNMP), referred to as discrete NNMP. The proposed class characterizes spatial variability by a weighted combination of first-order conditional probability mass functions (pmfs) for each one of a given number of neighbors. The approach supports flexible modeling for multivariate dependence through specification of general bivariate discrete distributions that define the conditional pmfs. Moreover, the discrete NNMP allows for construction of models given a pre-specified family of marginal distributions that can vary in space, facilitating covariate inclusion. In particular, we develop a modeling and inferential framework for copula-based NNMPs that can attain flexible dependence structures, motivating the use of bivariate copula families for spatial processes. Compared to the traditional class of spatial generalized linear mixed models, where spatial dependence is introduced through a transformation of response means, our process-based modeling approach provides both computational and inferential advantages. We illustrate the benefits with synthetic data examples and an analysis of North American Breeding Bird Survey data.
In this paper we study the frequentist convergence rate for the Latent Dirichlet Allocation (Blei et al., 2003) topic models. We show that the maximum likelihood estimator converges to one of the finitely many equivalent parameters in Wasserstein's distance metric at a rate of $n^{-1/4}$ without assuming separability or non-degeneracy of the underlying topics and/or the existence of more than three words per document, thus generalizing the previous works of Anandkumar et al. (2012, 2014) from an information-theoretical perspective. We also show that the $n^{-1/4}$ convergence rate is optimal in the worst case.
We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial auto-encoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score.