We consider the problem of estimating a $d$-dimensional discrete distribution from its samples observed under a $b$-bit communication constraint. In contrast to most previous results that largely focus on the global minimax error, we study the local behavior of the estimation error and provide \emph{pointwise} bounds that depend on the target distribution $p$. In particular, we show that the $\ell_2$ error decays with $O\left(\frac{\lVert p\rVert_{1/2}}{n2^b}\vee \frac{1}{n}\right)$ (In this paper, we use $a\vee b$ and $a \wedge b$ to denote $\max(a, b)$ and $\min(a,b)$ respectively.) when $n$ is sufficiently large, hence it is governed by the \emph{half-norm} of $p$ instead of the ambient dimension $d$. For the achievability result, we propose a two-round sequentially interactive estimation scheme that achieves this error rate uniformly over all $p$. Our scheme is based on a novel local refinement idea, where we first use a standard global minimax scheme to localize $p$ and then use the remaining samples to locally refine our estimate. We also develop a new local minimax lower bound with (almost) matching $\ell_2$ error, showing that any interactive scheme must admit a $\Omega\left( \frac{\lVert p \rVert_{{(1+\delta)}/{2}}}{n2^b}\right)$ $\ell_2$ error for any $\delta > 0$. The lower bound is derived by first finding the best parametric sub-model containing $p$, and then upper bounding the quantized Fisher information under this model. Our upper and lower bounds together indicate that the $\mathcal{H}_{1/2}(p) = \log(\lVert p \rVert_{{1}/{2}})$ bits of communication is both sufficient and necessary to achieve the optimal (centralized) performance, where $\mathcal{H}_{{1}/{2}}(p)$ is the R\'enyi entropy of order $2$. Therefore, under the $\ell_2$ loss, the correct measure of the local communication complexity at $p$ is its R\'enyi entropy.
We introduce a natural online allocation problem that connects several of the most fundamental problems in online optimization. Let $M$ be an $n$-point metric space. Consider a resource that can be allocated in arbitrary fractions to the points of $M$. At each time $t$, a convex monotone cost function $c_t: [0,1]\to\mathbb{R}_+$ appears at some point $r_t\in M$. In response, an algorithm may change the allocation of the resource, paying movement cost as determined by the metric and service cost $c_t(x_{r_t})$, where $x_{r_t}$ is the fraction of the resource at $r_t$ at the end of time $t$. For example, when the cost functions are $c_t(x)=\alpha x$, this is equivalent to randomized MTS, and when the cost functions are $c_t(x)=\infty\cdot 1_{x<1/k}$, this is equivalent to fractional $k$-server. We give an $O(\log n)$-competitive algorithm for weighted star metrics. Due to the generality of allowed cost functions, classical multiplicative update algorithms do not work for the metric allocation problem. A key idea of our algorithm is to decouple the rate at which a variable is updated from its value, resulting in interesting new dynamics. This can be viewed as running mirror descent with a time-varying regularizer, and we use this perspective to further refine the guarantees of our algorithm. The standard analysis techniques run into multiple complications when the regularizer is time-varying, and we show how to overcome these issues by making various modifications to the default potential function. We also consider the problem when cost functions are allowed to be non-convex. In this case, we give tight bounds of $\Theta(n)$ on tree metrics, which imply deterministic and randomized competitive ratios of $O(n^2)$ and $O(n\log n)$ respectively on arbitrary metrics. Our algorithm is based on an $\ell_2^2$-regularizer.
Comparator circuits are a natural circuit model for studying bounded fan-out computation whose power sits between nondeterministic branching programs and general circuits. Despite having been studied for nearly three decades, the first superlinear lower bound against comparator circuits was proved only recently by G\'al and Robere (ITCS 2020), who established a $\Omega((n/\log n)^{1.5})$ lower bound on the size of comparator circuits computing an explicit function of $n$ bits. In this paper, we initiate the study of average-case complexity and circuit analysis algorithms for comparator circuits. Departing from previous approaches, we exploit the technique of shrinkage under random restrictions to obtain a variety of new results for this model. Among them, we show - Average-case Lower Bounds. For every $k = k(n)$ with $k \geq \log n$, there exists a polynomial-time computable function $f_k$ on $n$ bits such that, for every comparator circuit $C$ with at most $n^{1.5}/O(k\cdot \sqrt{\log n})$ gates, we have \[ \text{Pr}_{x\in\left\{ 0,1 \right\}^n}\left[C(x)=f_k(x)\right]\leq \frac{1}{2} + \frac{1}{2^{\Omega(k)}}. \] This average-case lower bound matches the worst-case lower bound of G\'al and Robere by letting $k=O(\log n)$. - #SAT Algorithms. There is an algorithm that counts the number of satisfying assignments of a given comparator circuit with at most $n^{1.5}/O\!\left(k\cdot \sqrt{\log n}\right)$ gates, in time $2^{n-k}\cdot\text{poly}(n)$, for any $k\leq n/4$. The running time is non-trivial when $k=\omega(\log n)$. - Pseudorandom Generators and MCSP Lower Bounds. There is a pseudorandom generator of seed length $s^{2/3+o(1)}$ that fools comparator circuits with $s$ gates. Also, using this PRG, we obtain an $n^{1.5-o(1)}$ lower bound for MCSP against comparator circuits.
We present a new data structure to approximate accurately and efficiently a polynomial $f$ of degree $d$ given as a list of coefficients. Its properties allow us to improve the state-of-the-art bounds on the bit complexity for the problems of root isolation and approximate multipoint evaluation. This data structure also leads to a new geometric criterion to detect ill-conditioned polynomials, implying notably that the standard condition number of the zeros of a polynomial is at least exponential in the number of roots of modulus less than $1/2$ or greater than $2$.Given a polynomial $f$ of degree $d$ with $\|f\|_1 \leq 2^\tau$ for $\tau \geq 1$, isolating all its complex roots or evaluating it at $d$ points can be done with a quasi-linear number of arithmetic operations. However, considering the bit complexity, the state-of-the-art algorithms require at least $d^{3/2}$ bit operations even for well-conditioned polynomials and when the accuracy required is low. Given a positive integer $m$, we can compute our new data structure and evaluate $f$ at $d$ points in the unit disk with an absolute error less than $2^{-m}$ in $\widetilde O(d(\tau+m))$ bit operations, where $\widetilde O(\cdot)$ means that we omit logarithmic factors. We also show that if $\kappa$ is the absolute condition number of the zeros of $f$, then we can isolate all the roots of $f$ in $\widetilde O(d(\tau + \log \kappa))$ bit operations. Moreover, our algorithms are simple to implement. For approximating the complex roots of a polynomial, we implemented a small prototype in \verb|Python/NumPy| that is an order of magnitude faster than the state-of-the-art solver \verb/MPSolve/ for high degree polynomials with random coefficients.
Given a simple graph $G$ and an integer $k$, the goal of $k$-Clique problem is to decide if $G$ contains a complete subgraph of size $k$. We say an algorithm approximates $k$-Clique within a factor $g(k)$ if it can find a clique of size at least $k / g(k)$ when $G$ is guaranteed to have a $k$-clique. Recently, it was shown that approximating $k$-Clique within a constant factor is W[1]-hard [Lin21]. We study the approximation of $k$-Clique under the Exponential Time Hypothesis (ETH). The reduction of [Lin21] already implies an $n^{\Omega(\sqrt[6]{\log k})}$-time lower bound under ETH. We improve this lower bound to $n^{\Omega(\log k)}$. Using the gap-amplification technique by expander graphs, we also prove that there is no $k^{o(1)}$ factor FPT-approximation algorithm for $k$-Clique under ETH. We also suggest a new way to prove the Parameterized Inapproximability Hypothesis (PIH) under ETH. We show that if there is no $n^{O(\frac{k}{\log k})}$ algorithm to approximate $k$-Clique within a constant factor, then PIH is true.
The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.
This paper studies distributed binary test of statistical independence under communication (information bits) constraints. While testing independence is very relevant in various applications, distributed independence test is particularly useful for event detection in sensor networks where data correlation often occurs among observations of devices in the presence of a signal of interest. By focusing on the case of two devices because of their tractability, we begin by investigating conditions on Type I error probability restrictions under which the minimum Type II error admits an exponential behavior with the sample size. Then, we study the finite sample-size regime of this problem. We derive new upper and lower bounds for the gap between the minimum Type II error and its exponential approximation under different setups, including restrictions imposed on the vanishing Type I error probability. Our theoretical results shed light on the sample-size regimes at which approximations of the Type II error probability via error exponents became informative enough in the sense of predicting well the actual error probability. We finally discuss an application of our results where the gap is evaluated numerically, and we show that exponential approximations are not only tractable but also a valuable proxy for the Type II probability of error in the finite-length regime.
A central problem in Binary Hypothesis Testing (BHT) is to determine the optimal tradeoff between the Type I error (referred to as false alarm) and Type II (referred to as miss) error. In this context, the exponential rate of convergence of the optimal miss error probability -- as the sample size tends to infinity -- given some (positive) restrictions on the false alarm probabilities is a fundamental question to address in theory. Considering the more realistic context of a BHT with a finite number of observations, this paper presents a new non-asymptotic result for the scenario with monotonic (sub-exponential decreasing) restriction on the Type I error probability, which extends the result presented by Strassen in 2009. Building on the use of concentration inequalities, we offer new upper and lower bounds to the optimal Type II error probability for the case of finite observations. Finally, the derived bounds are evaluated and interpreted numerically (as a function of the number samples) for some vanishing Type I error restrictions.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.