We present a deterministic $(1+\varepsilon)$-approximate maximum matching algorithm in $\mathsf{poly} 1/\varepsilon$ passes in the semi-streaming model, solving the long-standing open problem of breaking the exponential barrier in the dependence on $1/\varepsilon$. Our algorithm exponentially improves on the well-known randomized $(1/\varepsilon)^{O(1/\varepsilon)}$-pass algorithm from the seminal work by McGregor~[APPROX05], the recent deterministic algorithm by Tirodkar with the same pass complexity~[FSTTCS18]. Up to polynomial factors in $1/\varepsilon$, our work matches the state-of-the-art deterministic $(\log n / \log \log n) \cdot (1/\varepsilon)$-pass algorithm by Ahn and Guha~[TOPC18], that is allowed a dependence on the number of nodes $n$. Our result also makes progress on the Open Problem 60 at sublinear.info. Moreover, we design a general framework that simulates our approach for the streaming setting in other models of computation. This framework requires access to an algorithm computing a maximal matching and an algorithm for processing disjoint $(\mathsf{poly} 1 / \varepsilon)$-size connected components. Instantiating our framework in $\mathsf{CONGEST}$ yields a $\mathsf{poly}(\log{n}, 1/\varepsilon)$ round algorithm for computing $(1+\varepsilon$)-approximate maximum matching. In terms of the dependence on $1/\varepsilon$, this result improves exponentially state-of-the-art result by Lotker, Patt-Shamir, and Pettie~[LPSP15]. Our framework leads to the same quality of improvement in the context of the Massively Parallel Computation model as well.
Probabilistic zero forcing is a coloring game played on a graph where the goal is to color every vertex blue starting with an initial blue vertex set. As long as the graph is connected, if at least one vertex is blue then eventually all of the vertices will be colored blue. The most studied parameter in probabilistic zero forcing is the expected propagation time starting from a given vertex of $G.$ In this paper we improve on upper bounds for the expected propagation time by Geneson and Hogben and Chan et al. in terms of a graph's order and radius. In particular, for a connected graph $G$ of order $n$ and radius $r,$ we prove the bound $\text{ept}(G) = O(r\log(n/r)).$ We also show using Doob's Optional Stopping Theorem and a combinatorial object known as a cornerstone that $\text{ept}(G) \le n/2 + O(\log n).$ Finally, we derive an explicit lower bound $\text{ept}(G)\ge \log_2 \log_2 n.$
In this paper, we show that, given two down-sets (simplicial complexes) there is a matching between them that matches disjoint sets and covers the smaller of the two down-sets. This result generalizes an unpublished result of Berge from circa 1980. The result has nice corollaries for cross-intersecting families and Chv\'atal's conjecture. More concretely, we show that Chv\'atal's conjecture is true for intersecting families with covering number $2$. A family $\mathcal F\subset 2^{[n]}$ is intersection-union (IU) if for any $A,B\in\mathcal F$ we have $1\le |A\cap B|\le n-1$. Using the aforementioned result, we derive several exact product- and sum-type results for IU-families.
In the $(1+\varepsilon,r)$-approximate near-neighbor problem for curves (ANNC) under some distance measure $\delta$, the goal is to construct a data structure for a given set $\mathcal{C}$ of curves that supports approximate near-neighbor queries: Given a query curve $Q$, if there exists a curve $C\in\mathcal{C}$ such that $\delta(Q,C)\le r$, then return a curve $C'\in\mathcal{C}$ with $\delta(Q,C')\le(1+\varepsilon)r$. There exists an efficient reduction from the $(1+\varepsilon)$-approximate nearest-neighbor problem to ANNC, where in the former problem the answer to a query is a curve $C\in\mathcal{C}$ with $\delta(Q,C)\le(1+\varepsilon)\cdot\delta(Q,C^*)$, where $C^*$ is the curve of $\mathcal{C}$ closest to $Q$. Given a set $\mathcal{C}$ of $n$ curves, each consisting of $m$ points in $d$ dimensions, we construct a data structure for ANNC that uses $n\cdot O(\frac{1}{\varepsilon})^{md}$ storage space and has $O(md)$ query time (for a query curve of length $m$), where the similarity between two curves is their discrete Fr\'echet or dynamic time warping distance. Our method is simple to implement, deterministic, and results in an exponential improvement in both query time and storage space compared to all previous bounds. Further, we also consider the asymmetric version of ANNC, where the length of the query curves is $k \ll m$, and obtain essentially the same storage and query bounds as above, except that $m$ is replaced by $k$. Finally, we apply our method to a version of approximate range counting for curves and achieve similar bounds.
We consider the problem of maximizing a non-negative submodular function under the $b$-matching constraint, in the semi-streaming model. When the function is linear, monotone, and non-monotone, we obtain the approximation ratios of $2+\varepsilon$, $3 + 2 \sqrt{2} \approx 5.828$, and $4 + 2 \sqrt{3} \approx 7.464$, respectively. We also consider a generalized problem, where a $k$-uniform hypergraph is given, along with an extra matroid or a $k'$-matchoid constraint imposed on the edges, with the same goal of finding a $b$-matching that maximizes a submodular function. When the extra constraint is a matroid, we obtain the approximation ratios of $k + 1 + \varepsilon$, $k + 2\sqrt{k+1} + 2$, and $k + 2\sqrt{k + 2} + 3$ for linear, monotone and non-monotone submodular functions, respectively. When the extra constraint is a $k'$-matchoid, we attain the approximation ratio $\frac{8}{3}k+ \frac{64}{9}k' + O(1)$ for general submodular functions.
The Maximum Induced Matching problem asks to find the maximum $k$ such that, given a graph $G=(V,E)$, can we find a subset of vertices $S$ of size $k$ for which every vertices $v$ in the induced graph $G[S]$ has exactly degree $1$. In this paper, we design an exact algorithm running in $O(1.2630^n)$ time and polynomial space to solve the Maximum Induced Matching problem for graphs where each vertex has degree at most 3. Prior work solved the problem by finding the Maximum Independent Set using polynomial space in the line graph $L(G^2)$; this method uses $O(1.3139^n)$ time.
The FEAST eigensolver is extended to the computation of the singular triplets of a large matrix $A$ with the singular values in a given interval. It is subspace iteration in nature applied to an approximate spectral projector associated with the cross-product matrix $A^TA$ and constructs approximate left and right singular subspaces corresponding to the desired singular values, onto which $A$ is projected to obtain approximations to the desired singular triplets. Approximate spectral projectors are constructed using the Chebyshev--Jackson series expansion other than contour integration and quadrature rules, and they are proven to be always symmetric positive semi-definite with the eigenvalues in $[0,1]$. Compact estimates are established for pointwise approximation errors of a specific step function that corresponds to the exact spectral projector, the accuracy of the approximate spectral projector, the number of desired singular triplets,the distance between the desired right singular subspace and the subspace generated each iteration, and the convergence of the FEAST SVDsolver. Practical selection strategies are proposed for the series degree and the subspace dimension. Numerical experiments illustrate that the FEAST SVDsolver is robust and efficient.
We present a $(1- \varepsilon)$-approximation algorithms for maximum cardinality matchings in disk intersection graphs -- all with near linear running time. We also present estimation algorithm that returns $(1\pm \varepsilon)$-approximation to the size of such matchings -- this algorithms run in linear time for unit disks, and $O(n \log n)$ for general disks (as long as the density is relatively small).
The suffix array $SA[1..n]$ of a text $T$ of length $n$ is a permutation of $\{1,\ldots,n\}$ describing the lexicographical ordering of suffixes of $T$, and it is considered to be among of the most important data structures in string algorithms, with dozens of applications in data compression, bioinformatics, and information retrieval. One of the biggest drawbacks of the suffix array is that it is very difficult to maintain under text updates: even a single character substitution can completely change the contents of the suffix array. Thus, the suffix array of a dynamic text is modelled using suffix array queries, which return the value $SA[i]$ given any $i\in[1..n]$. Prior to this work, the fastest dynamic suffix array implementations were by Amir and Boneh. At ISAAC 2020, they showed how to answer suffix array queries in $\tilde{O}(k)$ time, where $k\in[1..n]$ is a trade-off parameter, with $\tilde{O}(\frac{n}{k})$-time text updates. In a very recent preprint [2021], they also provided a solution with $O(\log^5 n)$-time queries and $\tilde{O}(n^{2/3})$-time updates. We propose the first data structure that supports both suffix array queries and text updates in $O({\rm polylog}\,n)$ time (achieving $O(\log^4 n)$ and $O(\log^{3+o(1)} n)$ time, respectively). Our data structure is deterministic and the running times for all operations are worst-case. In addition to the standard single-character edits (character insertions, deletions, and substitutions), we support (also in $O(\log^{3+o(1)} n)$ time) the "cut-paste" operation that moves any (arbitrarily long) substring of $T$ to any place in $T$. We complement our structure by a hardness result: unless the Online Matrix-Vector Multiplication (OMv) Conjecture fails, no data structure with $O({\rm polylog}\,n)$-time suffix array queries can support the "copy-paste" operation in $O(n^{1-\epsilon})$ time for any $\epsilon>0$.
Inspired by real-world applications such as the assignment of pupils to schools or the allocation of social housing, the one-sided matching problem studies how a set of agents can be assigned to a set of objects when the agents have preferences over the objects, but not vice versa. For fairness reasons, most mechanisms use randomness, and therefore result in a probabilistic assignment. We study the problem of decomposing these probabilistic assignments into a weighted sum of ex-post (Pareto-)efficient matchings, while maximizing the worst-case number of assigned agents. This decomposition preserves all the assignments' desirable properties, most notably strategy-proofness. For a specific class of probabilistic assignments, including the assignment by the Probabilistic Serial mechanism, we propose a polynomial-time algorithm for this problem that obtains a decomposition in which all matchings assign at least the expected number of assigned agents by the probabilistic assignment, rounded down, thus achieving the theoretically best possible guarantee. For general probabilistic assignments, the problem becomes NP-hard. For the Random Serial Dictatorship mechanism, we show that the worst-case number of assigned agents is at least half of the optimal, and that this bound is asymptotically tight. Lastly, we propose a column generation framework for the introduced problem, which we evaluate both on randomly generated data, and on real-world school choice data from the Belgian cities Antwerp and Ghent.
We consider the message complexity of State Machine Replication protocols dealing with Byzantine failures in the partial synchrony model. A result of Dolev and Reischuk gives a quadratic lower bound for the message complexity, but it was unknown whether this lower bound is tight, with the most efficient known protocols giving worst-case message complexity $O(n^3)$. We describe a protocol which meets Dolev and Reischuk's quadratic lower bound, while also satisfying other desirable properties. To specify these properties, suppose that we have $n$ replicas, $f$ of which display Byzantine faults (with $n\geq 3f+1$). Suppose that $\Delta$ is an upper bound on message delay, i.e. if a message is sent at time $t$, then it is received by time $ \text{max} \{ t, GST \} +\Delta $. We describe a deterministic protocol that simultaneously achieves $O(n^2)$ worst-case message complexity, optimistic responsiveness, $O(f\Delta )$ time to first confirmation after $GST$ and $O(n)$ mean message complexity.