We provide the sandwiched R\'enyi divergence of order $\alpha\in(\frac{1}{2},1)$, as well as its induced quantum information quantities, with an operational interpretation in the characterization of the exact strong converse exponents of quantum tasks. Specifically, we consider (a) smoothing of the max-relative entropy, (b) quantum privacy amplification, and (c) quantum information decoupling. We solve the problem of determining the exact strong converse exponents for these three tasks, with the performance being measured by the fidelity or purified distance. The results are given in terms of the sandwiched R\'enyi divergence of order $\alpha\in(\frac{1}{2},1)$, and its induced quantum R\'enyi conditional entropy and quantum R\'enyi mutual information. This is the first time to find the precise operational meaning for the sandwiched R\'enyi divergence with R\'enyi parameter in the interval $\alpha\in(\frac{1}{2},1)$.
In this note, we provide analytic expressions for the R\'enyi common information of orders in $(1,\infty)$ for the doubly symmetric binary source (DSBS). Until now, analytic expressions for the R\'enyi common information of all orders in $[0,\infty]$ have been completely known for this source. We also consider the R\'enyi common information of all orders in $[-\infty,0)$ and evaluate it for the DSBS. We provide a sufficient condition under which the R\'enyi common information of such orders coincides with Wyner's common information for the DSBS. Based on numerical analysis, we conjecture that there is a certain phase transition as the crossover probability increasing for the R\'enyi common information of negative orders for the DSBS. Our proofs are based on a lemma on splitting of the entropy and the analytic expression of relaxed Wyner's common information.
We show that deep neural networks (DNNs) can efficiently learn any composition of functions with bounded $F_{1}$-norm, which allows DNNs to break the curse of dimensionality in ways that shallow networks cannot. More specifically, we derive a generalization bound that combines a covering number argument for compositionality, and the $F_{1}$-norm (or the related Barron norm) for large width adaptivity. We show that the global minimizer of the regularized loss of DNNs can fit for example the composition of two functions $f^{*}=h\circ g$ from a small number of observations, assuming $g$ is smooth/regular and reduces the dimensionality (e.g. $g$ could be the modulo map of the symmetries of $f^{*}$), so that $h$ can be learned in spite of its low regularity. The measures of regularity we consider is the Sobolev norm with different levels of differentiability, which is well adapted to the $F_{1}$ norm. We compute scaling laws empirically and observe phase transitions depending on whether $g$ or $h$ is harder to learn, as predicted by our theory.
We prove that for any integers $\alpha, \beta > 1$, the existential fragment of the first-order theory of the structure $\langle \mathbb{Z}; 0,1,<, +, \alpha^{\mathbb{N}}, \beta^{\mathbb{N}}\rangle$ is decidable (where $\alpha^{\mathbb{N}}$ is the set of positive integer powers of $\alpha$, and likewise for $\beta^{\mathbb{N}}$). On the other hand, we show by way of hardness that decidability of the existential fragment of the theory of $\langle \mathbb{N}; 0,1, <, +, x\mapsto \alpha^x, x \mapsto \beta^x\rangle$ for any multiplicatively independent $\alpha,\beta > 1$ would lead to mathematical breakthroughs regarding base-$\alpha$ and base-$\beta$ expansions of certain transcendental numbers.
This paper deals with sufficient conditions on the distribution of the random variable $H$, in the model $X =\Pi_C(H)$, for the convex hull $\widehat C_N$ of $N$ independent copies of $X$ to be a consistent estimator of the convex body $C$ with a rate of convergence. The convergence of $\widehat C_N$ is established for the Hausdorff distance under a uniform condition on the distribution of $H$, but also in a pointwise sense under a less demanding condition. Some of these convergence results on $\widehat C_N$ are applied to the estimation of the time-dependent constraint set involved in a discrete-time Skorokhod problem.
We study the problem of high-dimensional sparse mean estimation in the presence of an $\epsilon$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on $\mathbb R^d$ with "certifiably bounded" $t$-th moments and sufficiently light tails, our algorithm achieves error of $O(\epsilon^{1-1/t})$ with sample complexity $m = (k\log(d))^{O(t)}/\epsilon^{2-2/t}$. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of $\tilde O(\epsilon)$ with sample complexity $m = O(k^4 \mathrm{polylog}(d))/\epsilon^2$. Our algorithms follow the Sum-of-Squares based, proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively the best possible.
We present a generalisation of the theory of quantitative algebras of Mardare, Panangaden and Plotkin where (i) the carriers of quantitative algebras are not restricted to be metric spaces and can be arbitrary fuzzy relations or generalised metric spaces, and (ii) the interpretations of the algebraic operations are not required to be nonexpansive. Our main results include: a novel sound and complete proof system, the proof that free quantitative algebras always exist, the proof of strict monadicity of the induced Free-Forgetful adjunction, the result that all monads (on fuzzy relations) that lift finitary monads (on sets) admit a quantitative equational presentation.
We consider the problem of determining the manifold $n$-widths of Sobolev and Besov spaces with error measured in the $L_p$-norm. The manifold widths control how efficiently these spaces can be approximated by general non-linear parametric methods with the restriction that the parameter selection and parameterization maps must be continuous. Existing upper and lower bounds only match when the Sobolev or Besov smoothness index $q$ satisfies $q\leq p$ or $1 \leq p \leq 2$. We close this gap and obtain sharp lower bounds for all $1 \leq p,q \leq \infty$ for which a compact embedding holds. A key part of our analysis is to determine the exact value of the manifold widths of finite dimensional $\ell^M_q$-balls in the $\ell_p$-norm when $p\leq q$. Although this result is not new, we provide a new proof and apply it to lower bounding the manifold widths of Sobolev and Besov spaces. Our results show that the Bernstein widths, which are typically used to lower bound the manifold widths, decay asymptotically faster than the manifold widths in many cases.
Samples of phylogenetic trees arise in a variety of evolutionary and biomedical applications, and the Fr\'echet mean in Billera-Holmes-Vogtmann tree space is a summary tree shown to have advantages over other mean or consensus trees. However, use of the Fr\'echet mean raises computational and statistical issues which we explore in this paper. The Fr\'echet sample mean is known often to contain fewer internal edges than the trees in the sample, and in this circumstance calculating the mean by iterative schemes can be problematic due to slow convergence. We present new methods for identifying edges which must lie in the Fr\'echet sample mean and apply these to a data set of gene trees relating organisms from the apicomplexa which cause a variety of parasitic infections. When a sample of trees contains a significant level of heterogeneity in the branching patterns, or topologies, displayed by the trees then the Fr\'echet mean is often a star tree, lacking any internal edges. Not only in this situation, the population Fr\'echet mean is affected by a non-Euclidean phenomenon called stickness which impacts upon asymptotics, and we examine two data sets for which the mean tree is a star tree. The first consists of trees representing the physical shape of artery structures in a sample of medical images of human brains in which the branching patterns are very diverse. The second consists of gene trees from a population of baboons in which there is evidence of substantial hybridization. We develop hypothesis tests which work in the presence of stickiness. The first is a test for the presence of a given edge in the Fr\'echet population mean; the second is a two-sample test for differences in two distributions which share the same sticky population mean.
This paper addresses the problem of finding a minimum-cost $m$-state Markov chain $(S_0,\ldots,S_{m-1})$ in a large set of chains. The chains studied have a reward associated with each state. The cost of a chain is its "gain", i.e., its average reward under its stationary distribution. Specifically, for each $k=0,\ldots,m-1$ there is a known set ${\mathbb S}_k$ of type-$k$ states. A permissible Markov chain contains exactly one state of each type; the problem is to find a minimum-cost permissible chain. The original motivation was to find a cheapest binary AIFV-$m$ lossless code on a source alphabet of size $n$. Such a code is an $m$-tuple of trees, in which each tree can be viewed as a Markov Chain state. This formulation was then used to address other problems in lossless compression. The known solution techniques for finding minimum-cost Markov chains were iterative and ran in exponential time. This paper shows how to map every possible type-$k$ state into a type-$k$ hyperplane and then define a "Markov Chain Polytope" as the lower envelope of all such hyperplanes. Finding a minimum-cost Markov chain can then be shown to be equivalent to finding a "highest" point on this polytope. The local optimization procedures used in the previous iterative algorithms are shown to be separation oracles for this polytope. Since these were often polynomial time, an application of the Ellipsoid method immediately leads to polynomial time algorithms for these problems.
Let $X$ be an $n$-element point set in the $k$-dimensional unit cube $[0,1]^k$ where $k \geq 2$. According to an old result of Bollob\'as and Meir (1992), there exists a cycle (tour) $x_1, x_2, \ldots, x_n$ through the $n$ points, such that $\left(\sum_{i=1}^n |x_i - x_{i+1}|^k \right)^{1/k} \leq c_k$, where $|x-y|$ is the Euclidean distance between $x$ and $y$, and $c_k$ is an absolute constant that depends only on $k$, where $x_{n+1} \equiv x_1$. From the other direction, for every $k \geq 2$ and $n \geq 2$, there exist $n$ points in $[0,1]^k$, such that their shortest tour satisfies $\left(\sum_{i=1}^n |x_i - x_{i+1}|^k \right)^{1/k} = 2^{1/k} \cdot \sqrt{k}$. For the plane, the best constant is $c_2=2$ and this is the only exact value known. Bollob{\'a}s and Meir showed that one can take $c_k = 9 \left(\frac23 \right)^{1/k} \cdot \sqrt{k}$ for every $k \geq 3$ and conjectured that the best constant is $c_k = 2^{1/k} \cdot \sqrt{k}$, for every $k \geq 2$. Here we significantly improve the upper bound and show that one can take $c_k = 3 \sqrt5 \left(\frac23 \right)^{1/k} \cdot \sqrt{k}$ or $c_k = 2.91 \sqrt{k} \ (1+o_k(1))$. Our bounds are constructive. We also show that $c_3 \geq 2^{7/6}$, which disproves the conjecture for $k=3$. Connections to matching problems, power assignment problems, related problems, including algorithms, are discussed in this context. A slightly revised version of the Bollob\'as--Meir conjecture is proposed.