Let $\Omega = [0,1]^d$ be the unit cube in $\mathbb{R}^d$. We study the problem of how efficiently, in terms of the number of parameters, deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $W^s(L_q(\Omega))$ and Besov spaces $B^s_r(L_q(\Omega))$, with error measured in the $L_p(\Omega)$ norm. This problem is important when studying the application of neural networks in a variety of fields, including scientific computing and signal processing, and has previously been solved only when $p=q=\infty$. Our contribution is to provide a complete solution for all $1\leq p,q\leq \infty$ and $s > 0$ for which the corresponding Sobolev or Besov space compactly embeds into $L_p$. The key technical tool is a novel bit-extraction technique which gives an optimal encoding of sparse vectors. This enables us to obtain sharp upper bounds in the non-linear regime where $p > q$. We also provide a novel method for deriving $L_p$-approximation lower bounds based upon VC-dimension when $p < \infty$. Our results show that very deep ReLU networks significantly outperform classical methods of approximation in terms of the number of parameters, but that this comes at the cost of parameters which are not encodable.
In an instance of the weighted Nash Social Welfare problem, we are given a set of $m$ indivisible items, $\mathscr{G}$, and $n$ agents, $\mathscr{A}$, where each agent $i \in \mathscr{A}$ has a valuation $v_{ij}\geq 0$ for each item $j\in \mathscr{G}$. In addition, every agent $i$ has a non-negative weight $w_i$ such that the weights collectively sum up to $1$. The goal is to find an assignment $\sigma:\mathscr{G}\rightarrow \mathscr{A}$ that maximizes $\prod_{i\in \mathscr{A}} \left(\sum_{j\in \sigma^{-1}(i)} v_{ij}\right)^{w_i}$, the product of the weighted valuations of the players. When all the weights equal $\frac1n$, the problem reduces to the classical Nash Social Welfare problem, which has recently received much attention. In this work, we present a $5\cdot\exp\left(2\cdot D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})\right) = 5\cdot\exp\left(2\log{n} + 2\sum_{i=1}^n w_i \log{w_i}\right)$-approximation algorithm for the weighted Nash Social Welfare problem, where $D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})$ denotes the KL-divergence between the distribution induced by $\mathbf{w}$ and the uniform distribution on $[n]$. We show a novel connection between the convex programming relaxations for the unweighted variant of Nash Social Welfare presented in \cite{cole2017convex, anari2017nash}, and generalize the programs to two different mathematical programs for the weighted case. The first program is convex and is necessary for computational efficiency, while the second program is a non-convex relaxation that can be rounded efficiently. The approximation factor derives from the difference in the objective values of the convex and non-convex relaxation.
Given two sets $\mathit{R}$ and $\mathit{B}$ of at most $\mathit{n}$ points in the plane, we present efficient algorithms to find a two-line linear classifier that best separates the ``red'' points in $\mathit{R}$ from the ``blue'' points in $B$ and is robust to outliers. More precisely, we find a region $\mathit{W}_\mathit{B}$ bounded by two lines, so either a halfplane, strip, wedge, or double wedge, containing (most of) the blue points $\mathit{B}$, and few red points. Our running times vary between optimal $O(n\log n)$ and $O(n^4)$, depending on the type of region $\mathit{W}_\mathit{B}$ and whether we wish to minimize only red outliers, only blue outliers, or both.
Given a set $P$ of $n$ points and a set $S$ of $n$ segments in the plane, we consider the problem of computing for each segment of $S$ its closest point in $P$. The previously best algorithm solves the problem in $n^{4/3}2^{O(\log^*n)}$ time [Bespamyatnikh, 2003] and a lower bound (under a somewhat restricted model) $\Omega(n^{4/3})$ has also been proved. In this paper, we present an $O(n^{4/3})$ time algorithm and thus solve the problem optimally (under the restricted model). In addition, we also present data structures for solving the online version of the problem, i.e., given a query segment (or a line as a special case), find its closest point in $P$. Our new results improve the previous work.
In the kernel density estimation (KDE) problem one is given a kernel $K(x, y)$ and a dataset $P$ of points in a Euclidean space, and must prepare a data structure that can quickly answer density queries: given a point $q$, output a $(1+\epsilon)$-approximation to $\mu:=\frac1{|P|}\sum_{p\in P} K(p, q)$. The classical approach to KDE is the celebrated fast multipole method of [Greengard and Rokhlin]. The fast multipole method combines a basic space partitioning approach with a multidimensional Taylor expansion, which yields a $\approx \log^d (n/\epsilon)$ query time (exponential in the dimension $d$). A recent line of work initiated by [Charikar and Siminelakis] achieved polynomial dependence on $d$ via a combination of random sampling and randomized space partitioning, with [Backurs et al.] giving an efficient data structure with query time $\approx \mathrm{poly}{\log(1/\mu)}/\epsilon^2$ for smooth kernels. Quadratic dependence on $\epsilon$, inherent to the sampling methods, is prohibitively expensive for small $\epsilon$. This issue is addressed by quasi-Monte Carlo methods in numerical analysis. The high level idea in quasi-Monte Carlo methods is to replace random sampling with a discrepancy based approach -- an idea recently applied to coresets for KDE by [Phillips and Tai]. The work of Phillips and Tai gives a space efficient data structure with query complexity $\approx 1/(\epsilon \mu)$. This is polynomially better in $1/\epsilon$, but exponentially worse in $1/\mu$. We achieve the best of both: a data structure with $\approx \mathrm{poly}{\log(1/\mu)}/\epsilon$ query time for smooth kernel KDE. Our main insight is a new way to combine discrepancy theory with randomized space partitioning inspired by, but significantly more efficient than, that of the fast multipole methods. We hope that our techniques will find further applications to linear algebra for kernel matrices.
We present a new algorithm for finding isolated zeros of a system of real-valued functions in a bounded interval in $\mathbb{R}^n$. It uses the Chebyshev proxy method combined with a mixture of subdivision, reduction methods, and elimination checks that leverage special properties of Chebyshev polynomials. We prove the method has R-quadratic convergence locally near simple zeros of the system. We also analyze the temporal complexity and the numerical stability of the algorithm and provide numerical evidence in dimensions up to three that the method is both fast and accurate on a wide range of problems. The algorithm should also work well in higher dimensions. Our tests show that the algorithm outperforms other standard methods on this problem of finding all real zeros in a bounded domain. Our Python implementation of the algorithm is publicly available.
Given a complex high-dimensional distribution over $\{\pm 1\}^n$, what is the best way to increase the expected number of $+1$'s by controlling the values of only a small number of variables? Such a problem is known as influence maximization and has been widely studied in social networks, biology, and computer science. In this paper, we consider influence maximization on the Ising model which is a prototypical example of undirected graphical models and has wide applications in many real-world problems. We establish a sharp computational phase transition for influence maximization on sparse Ising models under a bounded budget: In the high-temperature regime, we give a linear-time algorithm for finding a small subset of variables and their values which achieve nearly optimal influence; In the low-temperature regime, we show that the influence maximization problem cannot be solved in polynomial time under commonly-believed complexity assumption. The critical temperature coincides with the tree uniqueness/non-uniqueness threshold for Ising models which is also a critical point for other computational problems including approximate sampling and counting.
We propose a method for estimating a log-concave density on $\mathbb R^d$ from samples, under the assumption that there exists an orthogonal transformation that makes the components of the random vector independent. While log-concave density estimation is hard both computationally and statistically, the independent components assumption alleviates both issues, while still maintaining a large non-parametric class. We prove that under mild conditions, at most $\tilde{\mathcal{O}}(\epsilon^{-4})$ samples (suppressing constants and log factors) suffice for our proposed estimator to be within $\epsilon$ of the original density in squared Hellinger distance. On the computational front, while the usual log-concave maximum likelihood estimate can be obtained via a finite-dimensional convex program, it is slow to compute -- especially in higher dimensions. We demonstrate through numerical experiments that our estimator can be computed efficiently, making it more practical to use.
We study the lift-and-project rank of the stable set polytopes of graphs with respect to the Lov\'{a}sz--Schrijver SDP operator $\text{LS}_+$, with a particular focus on finding and characterizing the smallest graphs with a given $\text{LS}_+$-rank (the least number of iterations of the $\text{LS}_+$ operator on the fractional stable set polytope to compute the stable set polytope). We introduce a generalized vertex-stretching operation that appears to be promising in generating $\text{LS}_+$-minimal graphs and study its properties. We also provide several new $\text{LS}_+$-minimal graphs, most notably the first known instances of $12$-vertex graphs with $\text{LS}_+$-rank $4$, which provides the first advance in this direction since Escalante, Montelar, and Nasini's discovery of a $9$-vertex graph with $\text{LS}_+$-rank $3$ in 2006.
We study the lift-and-project rank of the stable set polytopes of graphs with respect to the Lov{\'a}sz--Schrijver SDP operator $\LS_+$, with a particular focus on a search for relatively small graphs with high $\LS_+$-rank (the least number of iterations of the $\LS_+$ operator on the fractional stable set polytope to compute the stable set polytope). We provide families of graphs whose $\LS_+$-rank is asymptotically a linear function of its number of vertices, which is the best possible up to improvements in the constant factor (previous best result in this direction, from 1999, yielded graphs whose $\LS_+$-rank only grew with the square root of the number of vertices).
We introduce the entangled quantum polynomial hierarchy $\mathsf{QEPH}$ as the class of problems that are efficiently verifiable given alternating quantum proofs that may be entangled with each other. We prove $\mathsf{QEPH}$ collapses to its second level. In fact, we show that a polynomial number of alternations collapses to just two. As a consequence, $\mathsf{QEPH} = \mathsf{QRG(1)}$, the class of problems having one-turn quantum refereed games, which is known to be contained in $\mathsf{PSPACE}$. This is in contrast to the unentangled quantum polynomial hierarchy $\mathsf{QPH}$, which contains $\mathsf{QMA(2)}$. We also introduce a generalization of the quantum-classical polynomial hierarchy $\mathsf{QCPH}$ where the provers send probability distributions over strings (instead of strings) and denote it by $\mathsf{DistributionQCPH}$. Conceptually, this class is intermediate between $\mathsf{QCPH}$ and $\mathsf{QPH}$. We prove $\mathsf{DistributionQCPH} = \mathsf{QCPH}$, suggesting that only quantum superposition (not classical probability) increases the computational power of these hierarchies. To prove this equality, we generalize a game-theoretic result of Lipton and Young (1994) which says that the provers can send distributions that are uniform over a polynomial-size support. We also prove the analogous result for the polynomial hierarchy, i.e., $\mathsf{DistributionPH} = \mathsf{PH}$. These results also rule out certain approaches for showing $\mathsf{QPH}$ collapses. Finally, we show that $\mathsf{PH}$ and $\mathsf{QCPH}$ are contained in $\mathsf{QPH}$, resolving an open question of Gharibian et al. (2022).