The greedy spanner in a low dimensional Euclidean space is a fundamental geometric construction that has been extensively studied over three decades as it possesses the two most basic properties of a good spanner: constant maximum degree and constant lightness. Recently, Eppstein and Khodabandeh showed that the greedy spanner in $\mathbb{R}^2$ admits a sublinear separator in a strong sense: any subgraph of $k$ vertices of the greedy spanner in $\mathbb{R}^2$ has a separator of size $O(\sqrt{k})$. Their technique is inherently planar and is not extensible to higher dimensions. They left showing the existence of a small separator for the greedy spanner in $\mathbb{R}^d$ for any constant $d\geq 3$ as an open problem. In this paper, we resolve the problem of Eppstein and Khodabandeh by showing that any subgraph of $k$ vertices of the greedy spanner in $\mathbb{R}^d$ has a separator of size $O(k^{1-1/d})$. We introduce a new technique that gives a simple characterization for any geometric graph to have a sublinear separator that we dub $\tau$-lanky: a geometric graph is $\tau$-lanky if any ball of radius $r$ cuts at most $\tau$ edges of length at least $r$ in the graph. We show that any $\tau$-lanky geometric graph of $n$ vertices in $\mathbb{R}^d$ has a separator of size $O(\tau n^{1-1/d})$. We then derive our main result by showing that the greedy spanner is $O(1)$-lanky. We indeed obtain a more general result that applies to unit ball graphs and point sets of low fractal dimensions in $\mathbb{R}^d$. Our technique naturally extends to doubling metrics. We use the $\tau$-lanky characterization to show that there exists a $(1+\epsilon)$-spanner for doubling metrics of dimension $d$ with a constant maximum degree and a separator of size $O(n^{1-\frac{1}{d}})$; this result resolves an open problem posed by Abam and Har-Peled a decade ago.
We study the problem of semi-supervised learning of an adversarially-robust predictor in the PAC model, where the learner has access to both labeled and unlabeled examples. The sample complexity in semi-supervised learning has two parameters, the number of labeled examples and the number of unlabeled examples. We consider the complexity measures, $VC_U \leq dim_U \leq VC$ and $VC^*$, where $VC$ is the standard $VC$-dimension, $VC^*$ is its dual, and the other two measures appeared in Montasser et al. (2019). The best sample bound known for robust supervised PAC learning is $O(VC \cdot VC^*)$, and we will compare our sample bounds to $\Lambda$ which is the minimal number of labeled examples required by any robust supervised PAC learning algorithm. Our main results are the following: (1) in the realizable setting it is sufficient to have $O(VC_U)$ labeled examples and $O(\Lambda)$ unlabeled examples. (2) In the agnostic setting, let $\eta$ be the minimal agnostic error. The sample complexity depends on the resulting error rate. If we allow an error of $2\eta+\epsilon$, it is still sufficient to have $O(VC_U)$ labeled examples and $O(\Lambda)$ unlabeled examples. If we insist on having an error $\eta+\epsilon$ then $\Omega(dim_U)$ labeled examples are necessary, as in the supervised case. The above results show that there is a significant benefit in semi-supervised robust learning, as there are hypothesis classes with $VC_U=0$ and $dim_U$ arbitrary large. In supervised learning, having access only to labeled examples requires at least $\Lambda \geq dim_U$ labeled examples. Semi-supervised require only $O(1)$ labeled examples and $O(\Lambda)$ unlabeled examples. A byproduct of our result is that if we assume that the distribution is robustly realizable by a hypothesis class, then with respect to the 0-1 loss we can learn with only $O(VC_U)$ labeled examples, even if the $VC$ is infinite.
We prove that the stack-number of the strong product of three $n$-vertex paths is $\Theta(n^{1/3})$. The best previously known upper bound was $O(n)$. No non-trivial lower bound was known. This is the first explicit example of a graph family with bounded maximum degree and unbounded stack-number. The main tool used in our proof of the lower bound is the topological overlap theorem of Gromov. We actually prove a stronger result in terms of so-called triangulations of Cartesian products. We conclude that triangulations of three-dimensional Cartesian products of any sufficiently large connected graphs have large stack-number. The upper bound is a special case of a more general construction based on families of permutations derived from Hadamard matrices. The strong product of three paths is also the first example of a bounded degree graph with bounded queue-number and unbounded stack-number. A natural question that follows from our result is to determine the smallest $\Delta_0$ such that there exist a graph family with unbounded stack-number, bounded queue-number and maximum degree $\Delta_0$. We show that $\Delta_0\in \{6,7\}$.
An intensive line of research on fixed parameter tractability of integer programming is focused on exploiting the relation between the sparsity of a constraint matrix $A$ and the norm of the elements of its Graver basis. In particular, integer programming is fixed parameter tractable when parameterized by the primal tree-depth and the entry complexity of $A$, and when parameterized by the dual tree-depth and the entry complexity of $A$; both these parameterization imply that $A$ is sparse, in particular, the number of its non-zero entries is linear in the number of columns or rows, respectively. We study preconditioners transforming a given matrix to an equivalent sparse matrix if it exists and provide structural results characterizing the existence of a sparse equivalent matrix in terms of the structural properties of the associated column matroid. In particular, our results imply that the $\ell_1$-norm of the Graver basis is bounded by a function of the maximum $\ell_1$-norm of a circuit of $A$. We use our results to design a parameterized algorithm that constructs a matrix equivalent to an input matrix $A$ that has small primal/dual tree-depth and entry complexity if such an equivalent matrix exists. Our results yield parameterized algorithms for integer programming when parameterized by the $\ell_1$-norm of the Graver basis of the constraint matrix, when parameterized by the $\ell_1$-norm of the circuits of the constraint matrix, when parameterized by the smallest primal tree-depth and entry complexity of a matrix equivalent to the constraint matrix, and when parameterized by the smallest dual tree-depth and entry complexity of a matrix equivalent to the constraint matrix.
Tusn\'ady's problem asks to bound the discrepancy of points and axis-parallel boxes in $\mathbb{R}^d$. Algorithmic bounds on Tusn\'ady's problem use a canonical decomposition of Matou\v{s}ek for the system of points and axis-parallel boxes, together with other techniques like partial coloring and / or random-walk based methods. We use the notion of \emph{shallow cell complexity} and the \emph{shallow packing lemma}, together with the chaining technique, to obtain an improved decomposition of the set system. Coupled with an algorithmic technique of Bansal and Garg for discrepancy minimization, which we also slightly extend, this yields improved algorithmic bounds on Tusn\'ady's problem. For $d\geq 5$, our bound matches the lower bound of $\Omega(\log^{d-1}n)$ given by Matou\v{s}ek, Nikolov and Talwar [IMRN, 2020] -- settling Tusn\'ady's problem, upto constant factors. For $d=2,3,4$, we obtain improved algorithmic bounds of $O(\log^{7/4}n)$, $O(\log^{5/2}n)$ and $O(\log^{13/4}n)$ respectively, which match or improve upon the non-constructive bounds of Nikolov for $d\geq 3$. Further, we also give improved bounds for the discrepancy of set systems of points and polytopes in $\mathbb{R}^d$ generated via translations of a fixed set of hyperplanes. As an application, we also get a bound for the geometric discrepancy of anchored boxes in $\mathbb{R}^d$ with respect to an arbitrary measure, matching the upper bound for the Lebesgue measure, which improves on a result of Aistleitner, Bilyk, and Nikolov [MC and QMC methods, \emph{Springer, Proc. Math. Stat.}, 2018] for $d\geq 4$.
Let $G=(V,E)$ be an undirected unweighted planar graph. Consider a vector storing the distances from an arbitrary vertex $v$ to all vertices $S = \{ s_1 , s_2 , \ldots , s_k \}$ of a single face in their cyclic order. The pattern of $v$ is obtained by taking the difference between every pair of consecutive values of this vector. In STOC'19, Li and Parter used a VC-dimension argument to show that in planar graphs, the number of distinct patterns, denoted $x$, is only $O(k^3)$. This resulted in a simple compression scheme requiring $\tilde O(\min \{ k^4+|T|, k\cdot |T|\})$ space to encode the distances between $S$ and a subset of terminal vertices $T \subseteq V$. This is known as the Okamura-Seymour metric compression problem. We give an alternative proof of the $x=O(k^3)$ bound that exploits planarity beyond the VC-dimension argument. Namely, our proof relies on cut-cycle duality, as well as on the fact that distances among vertices of $S$ are bounded by $k$. Our method implies the following: (1) An $\tilde{O}(x+k+|T|)$ space compression of the Okamura-Seymour metric, thus improving the compression of Li and Parter to $\tilde O(\min \{k^3+|T|,k \cdot |T| \})$. (2) An optimal $\tilde{O}(k+|T|)$ space compression of the Okamura-Seymour metric, in the case where the vertices of $T$ induce a connected component in $G$. (3) A tight bound of $x = \Theta(k^2)$ for the family of Halin graphs, whereas the VC-dimension argument is limited to showing $x=O(k^3)$.
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $\epsilon$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+\epsilon)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where $\|M\|_{S_p}$ denotes the $\ell_p$ norm of the the singular values of $M$. For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrt{\epsilon})$ matrix-vector products, improving on the na\"ive $\tilde{O}(k/\epsilon)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/\epsilon))$ factors. Our main result is an algorithm that uses only $\tilde{O}(kp^{1/6}/\epsilon^{1/3})$ matrix-vector products, and works for all $p \geq 1$. For $p = 2$ our bound improves the previous $\tilde{O}(k/\epsilon^{1/2})$ bound to $\tilde{O}(k/\epsilon^{1/3})$. Since the Schatten-$p$ and Schatten-$\infty$ norms are the same up to a $1+ \epsilon$ factor when $p \geq (\log d)/\epsilon$, our bound recovers the result of Musco and Musco for $p = \infty$. Further, we prove a matrix-vector query lower bound of $\Omega(1/\epsilon^{1/3})$ for any fixed constant $p \geq 1$, showing that surprisingly $\tilde{\Theta}(1/\epsilon^{1/3})$ is the optimal complexity for constant~$k$. To obtain our results, we introduce several new techniques, including optimizing over multiple Krylov subspaces simultaneously, and pinching inequalities for partitioned operators. Our lower bound for $p \in [1,2]$ uses the Araki-Lieb-Thirring trace inequality, whereas for $p>2$, we appeal to a norm-compression inequality for aligned partitioned operators.
Heged\H{u}s's lemma is the following combinatorial statement regarding polynomials over finite fields. Over a field $\F$ of characteristic $p > 0$ and for $q$ a power of $p$, the lemma says that any multilinear polynomial $P\in \F[x_1,\ldots,x_n]$ of degree less than $q$ that vanishes at all points in $\{0,1\}^n$ of Hamming weight $k\in [q,n-q]$ must also vanish at all points in $\{0,1\}^n$ of weight $k + q$. This lemma was used by Heged\H{u}s (2009) to give a solution to \emph{Galvin's problem}, an extremal problem about set systems; by Alon, Kumar and Volk (2018) to improve the best-known multilinear circuit lower bounds; and by Hrube\v{s}, Ramamoorthy, Rao and Yehudayoff (2019) to prove optimal lower bounds against depth-$2$ threshold circuits for computing some symmetric functions. In this paper, we formulate a robust version of Heged\H{u}s's lemma. Informally, this version says that if a polynomial of degree $o(q)$ vanishes at most points of weight $k$, then it vanishes at many points of weight $k+q$. We prove this lemma and give three different applications.
In this work, we extend the robust version of the Sylvester-Gallai theorem, obtained by Barak, Dvir, Wigderson and Yehudayoff, and by Dvir, Saraf and Wigderson, to the case of quadratic polynomials. Specifically, we prove that if $\mathcal{Q}\subset \mathbb{C}[x_1.\ldots,x_n]$ is a finite set, $|\mathcal{Q}|=m$, of irreducible quadratic polynomials that satisfy the following condition: There is $\delta>0$ such that for every $Q\in\mathcal{Q}$ there are at least $\delta m$ polynomials $P\in \mathcal{Q}$ such that whenever $Q$ and $P$ vanish then so does a third polynomial in $\mathcal{Q}\setminus\{Q,P\}$, then $\dim(\text{span}({\mathcal{Q}}))=\text{poly}(1/\delta)$. The work of Barak et al. and Dvir et al. studied the case of linear polynomials and proved an upper bound of $O(1/\delta)$ on the dimension (in the first work an upper bound of $O(1/\delta^2)$ was given, which was improved to $O(1/\delta)$ in the second work).
We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by [CST21]. (1) Separable minimax optimization. We study separable minimax optimization problems $\min_x \max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(L^x, \mu^x)$, $(L^y, \mu^y)$, and $h$ is convex-concave with a $(\Lambda^{xx}, \Lambda^{xy}, \Lambda^{yy})$-blockwise operator norm bounded Hessian. We provide an algorithm with gradient query complexity $\tilde{O}\left(\sqrt{\frac{L^{x}}{\mu^{x}}} + \sqrt{\frac{L^{y}}{\mu^{y}}} + \frac{\Lambda^{xx}}{\mu^{x}} + \frac{\Lambda^{xy}}{\sqrt{\mu^{x}\mu^{y}}} + \frac{\Lambda^{yy}}{\mu^{y}}\right)$. Notably, for convex-concave minimax problems with bilinear coupling (e.g.\ quadratics), where $\Lambda^{xx} = \Lambda^{yy} = 0$, our rate matches a lower bound of [ZHZ19]. (2) Finite sum optimization. We study finite sum optimization problems $\min_x \frac{1}{n}\sum_{i\in[n]} f_i(x)$, where each $f_i$ is $L_i$-smooth and the overall problem is $\mu$-strongly convex. We provide an algorithm with gradient query complexity $\tilde{O}\left(n + \sum_{i\in[n]} \sqrt{\frac{L_i}{n\mu}} \right)$. Notably, when the smoothness bounds $\{L_i\}_{i\in[n]}$ are non-uniform, our rate improves upon accelerated SVRG [LMH15, FGKS15] and Katyusha [All17] by up to a $\sqrt{n}$ factor. (3) Minimax finite sums. We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.