We consider the sequential composite binary hypothesis testing problem in which one of the hypotheses is governed by a single distribution while the other is governed by a family of distributions whose parameters belong to a known set $\Gamma$. We would like to design a test to decide which hypothesis is in effect. Under the constraints that the probabilities that the length of the test, a stopping time, exceeds $n$ are bounded by a certain threshold $\epsilon$, we obtain certain fundamental limits on the asymptotic behavior of the sequential test as $n$ tends to infinity. Assuming that $\Gamma$ is a convex and compact set, we obtain the set of all first-order error exponents for the problem. We also prove a strong converse. Additionally, we obtain the set of second-order error exponents under the assumption that $\mathcal{X}$ is a finite alphabet. In the proof of second-order asymptotics, a main technical contribution is the derivation of a central limit-type result for a maximum of an uncountable set of log-likelihood ratios under suitable conditions. This result may be of independent interest. We also show that some important statistical models satisfy the conditions.
Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational verification in probabilistic systems. We focus on concurrent stochastic games (CSGs), which can be used to model uncertainty and randomness in complex multi-agent environments. We study the rational verification problem for both non-cooperative games and cooperative games in the qualitative probabilistic setting. In the former case, we consider LTL properties satisfied by the Nash equilibria of the game and in the latter case LTL properties satisfied by the core. In both cases, we show that the problem is 2EXPTIME-complete, thus not harder than the much simpler verification problem of model checking LTL properties of systems modelled as Markov decision processes (MDPs).
In the 90's Clark, Colbourn and Johnson wrote a seminal paper where they proved that maximum clique can be solved in polynomial time in unit disk graphs. Since then, the complexity of maximum clique in intersection graphs of d-dimensional (unit) balls has been investigated. For ball graphs, the problem is NP-hard, as shown by Bonamy et al. (FOCS '18). They also gave an efficient polynomial time approximation scheme (EPTAS) for disk graphs. However, the complexity of maximum clique in this setting remains unknown. In this paper, we show the existence of a polynomial time algorithm for a geometric superclass of unit disk graphs. Moreover, we give partial results toward obtaining an EPTAS for intersection graphs of convex pseudo-disks.
In the Bayes paradigm and for a given loss function, we propose the construction of a new type of posterior distributions for estimating the law of an $n$-sample. The loss functions we have in mind are based on the total variation distance, the Hellinger distance as well as some $\mathbb{L}_{j}$-distances. We prove that, with a probability close to one, this new posterior distribution concentrates its mass in a neighbourhood of the law of the data, for the chosen loss function, provided that this law belongs to the support of the prior or, at least, lies close enough to it. We therefore establish that the new posterior distribution enjoys some robustness properties with respect to a possible misspecification of the prior, or more precisely, its support. For the total variation and squared Hellinger losses, we also show that the posterior distribution keeps its concentration properties when the data are only independent, hence not necessarily i.i.d., provided that most of their marginals are close enough to some probability distribution around which the prior puts enough mass. The posterior distribution is therefore also stable with respect to the equidistribution assumption. We illustrate these results by several applications. We consider the problems of estimating a location parameter or both the location and the scale of a density in a nonparametric framework. Finally, we also tackle the problem of estimating a density, with the squared Hellinger loss, in a high-dimensional parametric model under some sparcity conditions. The results established in this paper are non-asymptotic and provide, as much as possible, explicit constants.
For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we study the power of MX CI tests, yielding quantitative explanations for empirically observed phenomena and novel insights to guide the design of MX methodology. We show that any valid MX CI test must also be valid conditionally on Y and Z; this conditioning allows us to reformulate the problem as testing a point null hypothesis involving the conditional distribution of X. The Neyman-Pearson lemma then implies that the conditional randomization test (CRT) based on a likelihood statistic is the most powerful MX CI test against a point alternative. We also obtain a related optimality result for MX knockoffs. Switching to an asymptotic framework with arbitrarily growing covariate dimension, we derive an expression for the limiting power of the CRT against local semiparametric alternatives in terms of the prediction error of the machine learning algorithm on which its test statistic is based. Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error control under the assumption that only the first two moments of X given Z are known, a significant relaxation of the MX assumption.
Let $X_N$ be an $N$-dimensional subspace of $L_2$ functions on a probability space $(\Omega, \mu)$ spanned by a uniformly bounded Riesz basis $\Phi_N$. Given an integer $1\leq v\leq N$ and an exponent $1\leq q\leq 2$, we obtain universal discretization for integral norms $L_q(\Omega,\mu)$ of functions from the collection of all subspaces of $X_N$ spanned by $v$ elements of $\Phi_N$ with the number $m$ of required points satisfying $m\ll v(\log N)^2(\log v)^2$. This last bound on $m$ is much better than previously known bounds which are quadratic in $v$. Our proof uses a conditional theorem on universal sampling discretization, and an inequality of entropy numbers in terms of greedy approximation with respect to dictionaries.
Suppose we observe an infinite series of coin flips $X_1,X_2,\ldots$, and wish to sequentially test the null that these binary random variables are exchangeable. Nonnegative supermartingales (NSMs) are a workhorse of sequential inference, but we prove that they are powerless for this problem. First, utilizing a geometric concept called fork-convexity (a sequential analog of convexity), we show that any process that is an NSM under a set of distributions, is also necessarily an NSM under their "fork-convex hull". Second, we demonstrate that the fork-convex hull of the exchangeable null consists of all possible laws over binary sequences; this implies that any NSM under exchangeability is necessarily nonincreasing, hence always yields a powerless test for any alternative. Since testing arbitrary deviations from exchangeability is information theoretically impossible, we focus on Markovian alternatives. We combine ideas from universal inference and the method of mixtures to derive a "safe e-process", which is a nonnegative process with expectation at most one under the null at any stopping time, and is upper bounded by a martingale, but is not itself an NSM. This in turn yields a level $\alpha$ sequential test that is consistent; regret bounds from universal coding also demonstrate rate-optimal power. We present ways to extend these results to any finite alphabet and to Markovian alternatives of any order using a "double mixture" approach. We provide an array of simulations, and give general approaches based on betting for unstructured or ill-specified alternatives. Finally, inspired by Shafer, Vovk, and Ville, we provide game-theoretic interpretations of our e-processes and pathwise results.
We extend quantum Stein's lemma in asymmetric quantum hypothesis testing to composite null and alternative hypotheses. As our main result, we show that the asymptotic error exponent for testing convex combinations of quantum states $\rho^{\otimes n}$ against convex combinations of quantum states $\sigma^{\otimes n}$ can be written as a regularized quantum relative entropy formula. We prove that in general such a regularization is needed but also discuss various settings where our formula as well as extensions thereof become single-letter. This includes an operational interpretation of the relative entropy of coherence in terms of hypothesis testing. For our proof, we start from the composite Stein's lemma for classical probability distributions and lift the result to the non-commutative setting by using elementary properties of quantum entropy. Finally, our findings also imply an improved recoverability lower bound on the conditional quantum mutual information in terms of the regularized quantum relative entropy -- featuring an explicit and universal recovery map.
We consider parameter estimation in distributed networks, where each sensor in the network observes an independent sample from an underlying distribution and has $k$ bits to communicate its sample to a centralized processor which computes an estimate of a desired parameter. We develop lower bounds for the minimax risk of estimating the underlying parameter for a large class of losses and distributions. Our results show that under mild regularity conditions, the communication constraint reduces the effective sample size by a factor of $d$ when $k$ is small, where $d$ is the dimension of the estimated parameter. Furthermore, this penalty reduces at most exponentially with increasing $k$, which is the case for some models, e.g., estimating high-dimensional distributions. For other models however, we show that the sample size reduction is re-mediated only linearly with increasing $k$, e.g. when some sub-Gaussian structure is available. We apply our results to the distributed setting with product Bernoulli model, multinomial model, Gaussian location models, and logistic regression which recover or strengthen existing results. Our approach significantly deviates from existing approaches for developing information-theoretic lower bounds for communication-efficient estimation. We circumvent the need for strong data processing inequalities used in prior work and develop a geometric approach which builds on a new representation of the communication constraint. This approach allows us to strengthen and generalize existing results with simpler and more transparent proofs.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.