A $(1+\epsilon)$-approximate distance oracle of an edge-weighted graph is a data structure that returns an approximate shortest path distance between any two query vertices up to a $(1+\epsilon)$ factor. Thorup (FOCS 2001, JACM 2004) and Klein (SODA 2002) independently constructed a $(1+\epsilon)$-approximate distance oracle with $O(n\log n)$ space, measured in number of words, and $O(1)$ query time when $G$ is an undirected planar graph with $n$ vertices and $\epsilon$ is a fixed constant. Many follow-up works gave $(1+\epsilon)$-approximate distance oracles with various trade-offs between space and query time. However, improving $O(n\log n)$ space bound without sacrificing query time remains an open problem for almost two decades. In this work, we resolve this problem affirmatively by constructing a $(1+\epsilon)$-approximate distance oracle with optimal $O(n)$ space and $O(1)$ query time for undirected planar graphs and fixed $\epsilon$. We also make substantial progress for planar digraphs with non-negative edge weights. For fixed $\epsilon > 0$, we give a $(1+\epsilon)$-approximate distance oracle with space $o(n\log(Nn))$ and $O(\log\log(Nn)$ query time; here $N$ is the ratio between the largest and smallest positive edge weight. This improves Thorup's (FOCS 2001, JACM 2004) $O(n\log(Nn)\log n)$ space bound by more than a logarithmic factor while matching the query time of his structure. This is the first improvement for planar digraphs in two decades, both in the weighted and unweighted setting.
Given two weighted automata, we consider the problem of whether one is big-O of the other, i.e., if the weight of every finite word in the first is not greater than some constant multiple of the weight in the second. We show that the problem is undecidable, even for the instantiation of weighted automata as labelled Markov chains. Moreover, even when it is known that one weighted automaton is big-O of another, the problem of finding or approximating the associated constant is also undecidable. Our positive results show that the big-O problem is polynomial-time solvable for unambiguous automata, coNP-complete for unlabelled weighted automata (i.e., when the alphabet is a single character) and decidable, subject to Schanuel's conjecture, when the language is bounded (i.e., a subset of $w_1^*\dots w_m^*$ for some finite words $w_1,\dots,w_m$) or when the automaton has finite ambiguity. On labelled Markov chains, the problem can be restated as a ratio total variation distance, which, instead of finding the maximum difference between the probabilities of any two events, finds the maximum ratio between the probabilities of any two events. The problem is related to $\epsilon$-differential privacy, for which the optimal constant of the big-O notation is exactly $\exp(\epsilon)$.
The Maximum Induced Matching problem asks to find the maximum $k$ such that, given a graph $G=(V,E)$, can we find a subset of vertices $S$ of size $k$ for which every vertices $v$ in the induced graph $G[S]$ has exactly degree $1$. In this paper, we design an exact algorithm running in $O(1.2630^n)$ time and polynomial space to solve the Maximum Induced Matching problem for graphs where each vertex has degree at most 3. Prior work solved the problem by finding the Maximum Independent Set using polynomial space in the line graph $L(G^2)$; this method uses $O(1.3139^n)$ time.
The Fr\'{e}chet distance is a well-studied similarity measure between curves that is widely used throughout computer science. Motivated by applications where curves stem from paths and walks on an underlying graph (such as a road network), we define and study the Fr\'{e}chet distance for paths and walks on graphs. When provided with a distance oracle of $G$ with $O(1)$ query time, the classical quadratic-time dynamic program can compute the Fr\'{e}chet distance between two walks $P$ and $Q$ in a graph $G$ in $O(|P| \cdot |Q|)$ time. We show that there are situations where the graph structure helps with computing Fr\'{e}chet distance: when the graph $G$ is planar, we apply existing (approximate) distance oracles to compute a $(1+\varepsilon)$-approximation of the Fr\'{e}chet distance between any shortest path $P$ and any walk $Q$ in $O(|G| \log |G| / \sqrt{\varepsilon} + |P| + \frac{|Q|}{\varepsilon } )$ time. We generalise this result to near-shortest paths, i.e. $\kappa$-straight paths, as we show how to compute a $(1+\varepsilon)$-approximation between a $\kappa$-straight path $P$ and any walk $Q$ in $O(|G| \log |G| / \sqrt{\varepsilon} + |P| + \frac{\kappa|Q|}{\varepsilon } )$ time. Our algorithmic results hold for both the strong and the weak discrete Fr\'{e}chet distance over the shortest path metric in $G$. Finally, we show that additional assumptions on the input, such as our assumption on path straightness, are indeed necessary to obtain truly subquadratic running time. We provide a conditional lower bound showing that the Fr\'{e}chet distance, or even its $1.01$-approximation, between arbitrary \emph{paths} in a weighted planar graph cannot be computed in $O((|P|\cdot|Q|)^{1-\delta})$ time for any $\delta > 0$ unless the Orthogonal Vector Hypothesis fails. For walks, this lower bound holds even when $G$ is planar, unit-weight and has $O(1)$ vertices.
We present a $(1- \varepsilon)$-approximation algorithms for maximum cardinality matchings in disk intersection graphs -- all with near linear running time. We also present estimation algorithm that returns $(1\pm \varepsilon)$-approximation to the size of such matchings -- this algorithms run in linear time for unit disks, and $O(n \log n)$ for general disks (as long as the density is relatively small).
Let $\kappa(s,t)$ denote the maximum number of internally disjoint paths in an undirected graph $G$. We consider designing a data structure that includes a list of cuts, and answers the following query: given $s,t \in V$, determine whether $\kappa(s,t) \leq k$, and if so, return a pointer to an $st$-cut of size $\leq k$ (or to a minimum $st$-cut) in the list. A trivial data structure that includes a list of $n(n-1)/2$ cuts and requires $\Theta(kn^2)$ space can answer each query in $O(1)$ time. We obtain the following results. In the case when $G$ is $k$-connected, we show that $n$ cuts suffice, and that these cuts can be partitioned into $(2k+1)$ laminar families. Thus using space $O(kn)$ we can answers each min-cut query in $O(1)$ time, slightly improving and substantially simplifying a recent result of Pettie and Yin. We then extend this data structure to subset $k$-connectivity. In the general case we show that $(2k+1)n$ cuts suffice to return an $st$-cut of size $\leq k$,and a list of size $k(k+2)n$ contains a minimum $st$-cut for every $s,t \in V$. Combining our subset $k$-connectivity data structure with the data structure of Hsu and Lu for checking $k$-connectivity, we give an $O(k^2 n)$ space data structure that returns an $st$-cut of size $\leq k$ in $O(\log k)$ time, while $O(k^3 n)$ space enables to return a minimum $st$-cut.
Given a polyline on $n$ vertices, the polyline simplification problem asks for a minimum size subsequence of these vertices defining a new polyline whose distance to the original polyline is at most a given threshold under some distance measure. In this paper, we improve the long-standing running time bound for the simplification of polylines under the local Fr\'echet distance. The best algorithm known so far is by Imai and Iri and has a cubic running time in $n$. We present an algorithm with a running time of $O(n^2)$ under the $L_1$ and $L_\infty$ norm, and $O(n^2 \log n)$ under $L_{p \in (1,\infty)}$ (including the Euclidean norm $L_2$). Our approach is based on the ideas of Chan and Chin, who showed that under the local Hausdorff distance, the Imai-Iri algorithm can be improved to run in quadratic time for $L_1$, $L_2$, and $L_\infty$. However, the Hausdorff distance does not take the order of the points along the polyline into account. The Fr\'echet distance, which is sensitive to the course of the polylines, is hence often deemed the superior distance measure for polyline similarity but it also more intricate to compute. So far, the significantly faster simplification algorithms for the Hausdorff distance made them preferable for practical application. But our new algorithm for simplification under the Fr\'echet distance matches the running time bounds for the Hausdorff distance up to logarithmic factors and thus allows the usage of this more suitable distance measure.
Shamir and Spencer proved in the 1980s that the chromatic number of the binomial random graph G(n,p) is concentrated in an interval of length at most \omega\sqrt{n}, and in the 1990s Alon showed that an interval of length \omega\sqrt{n}/\log n suffices for constant edge-probabilities p \in (0,1). We prove a similar logarithmic improvement of the Shamir-Spencer concentration results for the sparse case p=p(n) \to 0, and uncover a surprising concentration `jump' of the chromatic number in the very dense case p=p(n) \to 1.
Due to the falling costs of data acquisition and storage, researchers and industry analysts often want to find all instances of rare events in large datasets. For instance, scientists can cheaply capture thousands of hours of video, but are limited by the need to manually inspect long videos to identify relevant objects and events. To reduce this cost, recent work proposes to use cheap proxy models, such as image classifiers, to identify an approximate set of data points satisfying a data selection filter. Unfortunately, this recent work does not provide the statistical accuracy guarantees necessary in scientific and production settings. In this work, we introduce novel algorithms for approximate selection queries with statistical accuracy guarantees. Namely, given a limited number of exact identifications from an oracle, often a human or an expensive machine learning model, our algorithms meet a minimum precision or recall target with high probability. In contrast, existing approaches can catastrophically fail in satisfying these recall and precision targets. We show that our algorithms can improve query result quality by up to 30x for both the precision and recall targets in both real and synthetic datasets.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.