We study sequential probability assignment in the Gaussian setting, where the goal is to predict, or equivalently compress, a sequence of real-valued observations almost as well as the best Gaussian distribution with mean constrained to a given subset of $\mathbf{R}^n$. First, in the case of a convex constraint set $K$, we express the hardness of the prediction problem (the minimax regret) in terms of the intrinsic volumes of $K$; specifically, it equals the logarithm of the Wills functional from convex geometry. We then establish a comparison inequality for the Wills functional in the general nonconvex case, which underlines the metric nature of this quantity and generalizes the Slepian-Sudakov-Fernique comparison principle for the Gaussian width. Motivated by this inequality, we characterize the exact order of magnitude of the considered functional for a general nonconvex set, in terms of global covering numbers and local Gaussian widths. This implies metric isomorphic estimates for the log-Laplace transform of the intrinsic volume sequence of a convex body. As part of our analysis, we also characterize the minimax redundancy for a general constraint set. We finally relate and contrast our findings with classical asymptotic results in information theory.
In this paper we discuss potentially practical ways to produce expander graphs with good spectral properties and a compact description. We focus on several classes of uniform and bipartite expander graphs defined as random Schreier graphs of the general linear group over the finite field of size two. We perform numerical experiments and show that such constructions produce spectral expanders that can be useful for practical applications. To find a theoretical explanation of the observed experimental results, we used the method of moments to prove upper bounds for the expected second largest eigenvalue of the random Schreier graphs used in our constructions. We focus on bounds for which it is difficult to study the asymptotic behaviour but it is possible to compute non-trivial conclusions for relatively small graphs with parameters from our numerical experiments (e.g., with less than 2^200 vertices and degree at least logarithmic in the number of vertices).
We study the time complexity of the discrete $k$-center problem and related (exact) geometric set cover problems when $k$ or the size of the cover is small. We obtain a plethora of new results: - We give the first subquadratic algorithm for rectilinear discrete 3-center in 2D, running in $\widetilde{O}(n^{3/2})$ time. - We prove a lower bound of $\Omega(n^{4/3-\delta})$ for rectilinear discrete 3-center in 4D, for any constant $\delta>0$, under a standard hypothesis about triangle detection in sparse graphs. - Given $n$ points and $n$ weighted axis-aligned unit squares in 2D, we give the first subquadratic algorithm for finding a minimum-weight cover of the points by 3 unit squares, running in $\widetilde{O}(n^{8/5})$ time. We also prove a lower bound of $\Omega(n^{3/2-\delta})$ for the same problem in 2D, under the well-known APSP Hypothesis. For arbitrary axis-aligned rectangles in 2D, our upper bound is $\widetilde{O}(n^{7/4})$. - We prove a lower bound of $\Omega(n^{2-\delta})$ for Euclidean discrete 2-center in 13D, under the Hyperclique Hypothesis. This lower bound nearly matches the straightforward upper bound of $\widetilde{O}(n^\omega)$, if the matrix multiplication exponent $\omega$ is equal to 2. - We similarly prove an $\Omega(n^{k-\delta})$ lower bound for Euclidean discrete $k$-center in $O(k)$ dimensions for any constant $k\ge 3$, under the Hyperclique Hypothesis. This lower bound again nearly matches known upper bounds if $\omega=2$. - We also prove an $\Omega(n^{2-\delta})$ lower bound for the problem of finding 2 boxes to cover the largest number of points, given $n$ points and $n$ boxes in 12D. This matches the straightforward near-quadratic upper bound.
In this paper, we examine the computational complexity of enumeration in certain genome rearrangement models. We first show that the Pairwise Rearrangement problem in the Single Cut-and-Join model (Bergeron, Medvedev, & Stoye, J. Comput. Biol. 2010) is $\#\textsf{P}$-complete under polynomial-time Turing reductions. Next, we show that in the Single Cut or Join model (Feijao & Meidanis, IEEE ACM Trans. Comp. Biol. Bioinf. 2011), the problem of enumerating all medians ($\#$Median) is logspace-computable ($\textsf{FL}$), improving upon the previous polynomial-time ($\textsf{FP}$) bound of Mikl\'os & Smith (RECOMB 2015).
Proportional rate models are among the most popular methods for analyzing the rate function of counting processes. Although providing a straightforward rate-ratio interpretation of covariate effects, the proportional rate assumption implies that covariates do not modify the shape of the rate function. When such an assumption does not hold, we propose describing the relationship between the rate function and covariates through two indices: the shape index and the size index. The shape index allows the covariates to flexibly affect the shape of the rate function, and the size index retains the interpretability of covariate effects on the magnitude of the rate function. To overcome the challenges in simultaneously estimating the two sets of parameters, we propose a conditional pseudolikelihood approach to eliminate the size parameters in shape estimation and an event count projection approach for size estimation. The proposed estimators are asymptotically normal with a root-$n$ convergence rate. Simulation studies and an analysis of recurrent hospitalizations using SEER-Medicare data are conducted to illustrate the proposed methods.
The quadratic decaying property of the information rate function states that given a fixed conditional distribution $p_{\mathsf{Y}|\mathsf{X}}$, the mutual information between the (finite) discrete random variables $\mathsf{X}$ and $\mathsf{Y}$ decreases at least quadratically in the Euclidean distance as $p_\mathsf{X}$ moves away from the capacity-achieving input distributions. It is a property of the information rate function that is particularly useful in the study of higher order asymptotics and finite blocklength information theory, where it was already implicitly used by Strassen [1] and later, more explicitly, by Polyanskiy-Poor-Verd\'u [2]. However, the proofs outlined in both works contain gaps that are nontrivial to close. This comment provides an alternative, complete proof of this property.
Constrained clustering problems generalize classical clustering formulations, e.g., $k$-median, $k$-means, by imposing additional constraints on the feasibility of clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out $m$ points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained $k$-median or $k$-means problem to the corresponding outlier-free version with only $(1+\varepsilon)$-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to $f(k,m, \varepsilon)$ instances of the outlier-free version, where $f(k, m, \varepsilon) = \left( \frac{k+m}{\varepsilon}\right)^{O(m)}$. As specific applications, we get the following results: - First FPT (in the parameters $k$ and $m$) $(1+\varepsilon)$-approximation algorithm for the outlier version of capacitated $k$-median and $k$-means in Euclidean spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(3+\varepsilon)$ and $(9+\varepsilon)$ approximation algorithms for the outlier version of capacitated $k$-median and $k$-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(2-\delta)$-approximation algorithm for the outlier version of the $k$-median problem under the Ulam metric. Our work generalizes the known results to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.
Optimal estimation and inference for both the minimizer and minimum of a convex regression function under the white noise and nonparametric regression models are studied in a non-asymptotic local minimax framework, where the performance of a procedure is evaluated at individual functions. Fully adaptive and computationally efficient algorithms are proposed and sharp minimax lower bounds are given for both the estimation accuracy and expected length of confidence intervals for the minimizer and minimum. The non-asymptotic local minimax framework brings out new phenomena in simultaneous estimation and inference for the minimizer and minimum. We establish a novel Uncertainty Principle that provides a fundamental limit on how well the minimizer and minimum can be estimated simultaneously for any convex regression function. A similar result holds for the expected length of the confidence intervals for the minimizer and minimum.
In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.
The Kolmogorov $N$-width describes the best possible error one can achieve by elements of an $N$-dimensional linear space. Its decay has extensively been studied in Approximation Theory and for the solution of Partial Differential Equations (PDEs). Particular interest has occurred within Model Order Reduction (MOR) of parameterized PDEs e.g.\ by the Reduced Basis Method (RBM). While it is known that the $N$-width decays exponentially fast (and thus admits efficient MOR) for certain problems, there are examples of the linear transport and the wave equation, where the decay rate deteriorates to $N^{-1/2}$. On the other hand, it is widely accepted that a smooth parameter dependence admits a fast decay of the $N$-width. However, a detailed analysis of the influence of properties of the data (such as regularity or slope) on the rate of the $N$-width seems to lack. In this paper, we use techniques from Fourier Analysis to derive exact representations of the $N$-width in terms of initial and boundary conditions of the linear transport equation modeled by some function $g$ for half-wave symmetric data. For arbitrary functions $g$, we derive bounds and prove that these bounds are sharp. In particular, we prove that the $N$-width decays as $c_r N^{-(r+1/2)}$ for functions in the Sobolev space, $g\in H^r$. Our theoretical investigations are complemented by numerical experiments which confirm the sharpness of our bounds and give additional quantitative insight.
This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [99], [101] of a dictionary of atom diagrams. We introduce a multi-scale gradient descent approach for the efficient resolution of the corresponding minimization problem, which interleaves the optimization of the barycenter weights with the optimization of the atom diagrams. Our approach leverages the analytic expressions for the gradient of both sub-problems to ensure fast iterations and it additionally exploits shared-memory parallelism. Extensive experiments on public ensembles demonstrate the efficiency of our approach, with Wasserstein dictionary computations in the orders of minutes for the largest examples. We show the utility of our contributions in two applications. First, we apply Wassserstein dictionaries to data reduction and reliably compress persistence diagrams by concisely representing them with their weights in the dictionary. Second, we present a dimensionality reduction framework based on a Wasserstein dictionary defined with a small number of atoms (typically three) and encode the dictionary as a low dimensional simplex embedded in a visual space (typically in 2D). In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used to reproduce our results.