As it is known, universal codes, which estimate the entropy rate consistently, exist for stationary ergodic sources over any finite alphabet but not over a countably infinite one. We cast the problem of universal coding into the problem of universal densities with respect to a given reference measure on a countably generated measurable space, examples being the counting measure or the Lebesgue measure. We show that universal densities, which estimate the differential entropy rate consistently, exist if the reference measure is finite, which disproves that the assumption of a finite alphabet is necessary in some sense. To exhibit a universal density, we combine the prediction by partial matching (PPM) code with the recently proposed non-parametric differential (NPD) entropy rate estimator, extended by putting a prior both over all Markov orders and all quantization levels. The proof of universality applies Barron's asymptotic equipartition for densities and continuity of $f$-divergences for filtrations. As an application, we demonstrate that any universal density induces a strongly consistent Ces\`aro mean estimator of the conditional density given an infinite past, which solves the problem of universal prediction with the $0-1$ loss for a countable alphabet, by the way. We also show that there exists a strongly consistent entropy rate estimator with respect to the Lebesgue measure in the class of stationary ergodic Gaussian processes.
The original formulation of de Finetti's theorem says that an exchangeable sequence of Bernoulli random variables is a mixture of iid sequences of random variables. Following the work of Hewitt and Savage, this theorem was previously known for several classes of exchangeable random variables (for instance, for Baire measurable random variables taking values in a compact Hausdorff space, and for Borel measurable random variables taking values in a Polish space). Under an assumption of the underlying common distribution being Radon, we show that de Finetti's theorem holds for a sequence of Borel measurable exchangeable random variables taking values in any Hausdorff space. This includes and generalizes the previously known versions of de Finetti's theorem. Furthermore, it shows that the theorem is explainable as a consequence of the nature of the distribution of the exchangeable random variables as opposed to the topological nature of their state space, which do not play a key role. Using known relative consistency results from set theory, this also shows that it is consistent with the axioms of ZFC that de Finetti's theorem holds for all sequences of exchangeable random variables taking values in any complete metric space. We use nonstandard analysis to first study the empirical measures induced by hyperfinitely many identically distributed random variables, which leads to a proof of de Finetti's theorem in great generality while retaining the combinatorial intuition of proofs of simpler versions of de Finetti's theorem. An overview of the requisite tools from nonstandard measure theory and topological meeasure theory is provided, with some new perspectives built at the interface between these fields as part of that overview. One highlight of this development is a new generalization of Prokhorov's theorem.
We study p-Laplacians and spectral clustering for a recently proposed hypergraph model that incorporates edge-dependent vertex weights (EDVW). These weights can reflect different importance of vertices within a hyperedge, thus conferring the hypergraph model higher expressivity and flexibility. By constructing submodular EDVW-based splitting functions, we convert hypergraphs with EDVW into submodular hypergraphs for which the spectral theory is better developed. In this way, existing concepts and theorems such as p-Laplacians and Cheeger inequalities proposed under the submodular hypergraph setting can be directly extended to hypergraphs with EDVW. For submodular hypergraphs with EDVW-based splitting functions, we propose an efficient algorithm to compute the eigenvector associated with the second smallest eigenvalue of the hypergraph 1-Laplacian. We then utilize this eigenvector to cluster the vertices, achieving higher clustering accuracy than traditional spectral clustering based on the 2-Laplacian. More broadly, the proposed algorithm works for all submodular hypergraphs that are graph reducible. Numerical experiments using real-world data demonstrate the effectiveness of combining spectral clustering based on the 1-Laplacian and EDVW.
Many novel notions of "risk" (e.g., CVaR, tilted risk, DRO risk) have been proposed and studied, but these risks are all at least as sensitive as the mean to loss tails on the upside, and tend to ignore deviations on the downside. We study a complementary new risk class that penalizes loss deviations in a bi-directional manner, while having more flexibility in terms of tail sensitivity than is offered by mean-variance. This class lets us derive high-probability learning guarantees without explicit gradient clipping, and empirical tests using both simulated and real data illustrate a high degree of control over key properties of the test loss distribution incurred by gradient-based learners.
Dependence is undoubtedly a central concept in statistics. Though, it proves difficult to locate in the literature a formal definition which goes beyond the self-evident 'dependence = non-independence'. This absence has allowed the term 'dependence' and its declination to be used vaguely and indiscriminately for qualifying a variety of disparate notions, leading to numerous incongruities. For example, the classical Pearson's, Spearman's or Kendall's correlations are widely regarded as 'dependence measures' of major interest, in spite of returning 0 in some cases of deterministic relationships between the variables at play, evidently not measuring dependence at all. Arguing that research on such a fundamental topic would benefit from a slightly more rigid framework, this paper suggests a general definition of the dependence between two random variables defined on the same probability space. Natural enough for aligning with intuition, that definition is still sufficiently precise for allowing unequivocal identification of a 'universal' representation of the dependence structure of any bivariate distribution. Links between this representation and familiar concepts are highlighted, and ultimately, the idea of a dependence measure based on that universal representation is explored and shown to satisfy Renyi's postulates.
In this work, we study the computational complexity of quantum determinants, a $q$-deformation of matrix permanents: Given a complex number $q$ on the unit circle in the complex plane and an $n\times n$ matrix $X$, the $q$-permanent of $X$ is defined as $$\mathrm{Per}_q(X) = \sum_{\sigma\in S_n} q^{\ell(\sigma)}X_{1,\sigma(1)}\ldots X_{n,\sigma(n)},$$ where $\ell(\sigma)$ is the inversion number of permutation $\sigma$ in the symmetric group $S_n$ on $n$ elements. The function family generalizes determinant and permanent, which correspond to the cases $q=-1$ and $q=1$ respectively. For worst-case hardness, by Liouville's approximation theorem and facts from algebraic number theory, we show that for primitive $m$-th root of unity $q$ for odd prime power $m=p^k$, exactly computing $q$-permanent is $\mathsf{Mod}_p\mathsf{P}$-hard. This implies that an efficient algorithm for computing $q$-permanent results in a collapse of the polynomial hierarchy. Next, we show that computing $q$-permanent can be achieved using an oracle that approximates to within a polynomial multiplicative error and a membership oracle for a finite set of algebraic integers. From this, an efficient approximation algorithm would also imply a collapse of the polynomial hierarchy. By random self-reducibility, computing $q$-permanent remains to be hard for a wide range of distributions satisfying a property called the strong autocorrelation property. Specifically, this is proved via a reduction from $1$-permanent to $q$-permanent for $O(1/n^2)$ points $z$ on the unit circle. Since the family of permanent functions shares common algebraic structure, various techniques developed for the hardness of permanent can be generalized to $q$-permanents.
Statistical risk assessments inform consequential decisions, such as pretrial release in criminal justice and loan approvals in consumer finance, by counterfactually predicting an outcome under a proposed decision (e.g., would the applicant default if we approved this loan?). There may, however, have been unmeasured confounders that jointly affected decisions and outcomes in the historical data. We propose a mean outcome sensitivity model that bounds the extent to which unmeasured confounders could affect outcomes on average. The mean outcome sensitivity model partially identifies the conditional likelihood of the outcome under the proposed decision, popular predictive performance metrics, and predictive disparities. We derive their identified sets and develop procedures for the confounding-robust learning and evaluation of statistical risk assessments. We propose a nonparametric regression procedure for the bounds on the conditional likelihood of the outcome under the proposed decision, and estimators for the bounds on predictive performance and disparities. Applying our methods to a real-world credit-scoring task from a large Australian financial institution, we show how varying assumptions on unmeasured confounding lead to substantive changes in the credit score's predictions and evaluations of its predictive disparities.
We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between $P\mathsf{K}$ and $Q\mathsf{K}$ output distributions of an $\varepsilon$-LDP mechanism $\mathsf{K}$ in terms of a divergence between the corresponding input distributions $P$ and $Q$, respectively. Our first main technical result presents a sharp upper bound on the $\chi^2$-divergence $\chi^2(P\mathsf{K}\|Q\mathsf{K})$ in terms of $\chi^2(P\|Q)$ and $\varepsilon$. We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on $\chi^2(P\mathsf{K}\|Q\mathsf{K})$ in terms of total variation distance $\mathsf{TV}(P, Q)$ and $\varepsilon$. We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam's, Assouad's, and the mutual information methods, which are powerful tools for bounding minimax estimation risks. These results are shown to lead to better privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing.
We study the enumeration of answers to Unions of Conjunctive Queries (UCQs) with optimal time guarantees. More precisely, we wish to identify the queries that can be solved with linear preprocessing time and constant delay. Despite the basic nature of this problem, it was shown only recently that UCQs can be solved within these time bounds if they admit free-connex union extensions, even if all individual CQs in the union are intractable with respect to the same complexity measure. Our goal is to understand whether there exist additional tractable UCQs, not covered by the currently known algorithms. As a first step, we show that some previously unclassified UCQs are hard using the classic 3SUM hypothesis, via a known reduction from 3SUM to triangle listing in graphs. As a second step, we identify a question about a variant of this graph task which is unavoidable if we want to classify all self-join free UCQs: is it possible to decide the existence of a triangle in a vertex-unbalanced tripartite graph in linear time? We prove that this task is equivalent in hardness to some family of UCQs. Finally, we show a dichotomy for unions of two self-join-free CQs if we assume the answer to this question is negative. Our conclusion is that, to reason about a class of enumeration problems defined by UCQs, it is enough to study the single decision problem of detecting triangles in unbalanced graphs. Without a breakthrough for triangle detection, we have no hope to find an efficient algorithm for additional unions of two self-join free CQs. On the other hand, if we will one day have such a triangle detection algorithm, we will immediately obtain an efficient algorithm for a family of UCQs that are currently not known to be tractable.
In group sequential analysis, data is collected and analyzed in batches until pre-defined stopping criteria are met. Inference in the parametric setup typically relies on the limiting asymptotic multivariate normality of the repeatedly computed maximum likelihood estimators (MLEs), a result first rigorously proved by Jennison and Turbull (1997) under general regularity conditions. In this work, using Stein's method we provide optimal order, non-asymptotic bounds on the distance for smooth test functions between the joint group sequential MLEs and the appropriate normal distribution under the same conditions. Our results assume independent observations but allow heterogeneous (i.e., non-identically distributed) data. We examine how the resulting bounds simplify when the data comes from an exponential family. Finally, we present a general result relating multivariate Kolmogorov distance to smooth function distance which, in addition to extending our results to the former metric, may be of independent interest.
Differential machine learning (DML) is a recently proposed technique that uses samplewise state derivatives to regularize least square fits to learn conditional expectations of functionals of stochastic processes as functions of state variables. Exploiting the derivative information leads to fewer samples than a vanilla ML approach for the same level of precision. This paper extends the methodology to parametric problems where the processes and functionals also depend on model and contract parameters, respectively. In addition, we propose adaptive parameter sampling to improve relative accuracy when the functionals have different magnitudes for different parameter sets. For calibration, we construct pricing surrogates for calibration instruments and optimize over them globally. We discuss strategies for robust calibration. We demonstrate the usefulness of our methodology on one-factor Cheyette models with benchmark rate volatility specification with an extra stochastic volatility factor on (two-curve) caplet prices at different strikes and maturities, first for parametric pricing, and then by calibrating to a given caplet volatility surface. To allow convenient and efficient simulation of processes and functionals and in particular the corresponding computation of samplewise derivatives, we propose to specify the processes and functionals in a low-code way close to mathematical notation which is then used to generate efficient computation of the functionals and derivatives in TensorFlow.