In this paper we present improved bounds for approximating maximum matchings in bipartite graphs in the streaming model. First, we consider the question of how well maximum matching can be approximated in a single pass over the input using $\tilde O(n)$ space, where $n$ is the number of vertices in the input graph. Two natural variants of this problem have been considered in the literature: (1) the edge arrival setting, where edges arrive in the stream and (2) the vertex arrival setting, where vertices on one side of the graph arrive in the stream together with all their incident edges. The latter setting has also been studied extensively in the context of online algorithms, where each arriving vertex has to either be matched irrevocably or discarded upon arrival. In the online setting, the celebrated algorithm of Karp-Vazirani-Vazirani achieves a $1-1/e$ approximation. Despite the fact that the streaming model is less restrictive in that the algorithm is not constrained to match vertices irrevocably upon arrival, the best known approximation in the streaming model with vertex arrivals and $\tilde O(n)$ space is the same factor of $1-1/e$. We show that no single pass streaming algorithm that uses $\tilde O(n)$ space can achieve a better than $1-1/e$ approximation to maximum matching, even in the vertex arrival setting. This leads to the striking conclusion that no single pass streaming algorithm can do better than online algorithms unless it uses significantly more than $\tilde O(n)$ space. Additionally, our bound yields the best known impossibility result for approximating matchings in the edge arrival model. We also give a simple algorithm that achieves approximation ratio $1-e^{-k}k^{k-1}/(k-1)!=1-\frac1{\sqrt{2\pi k}}+o(1/k)$ in $k$ passes in the vertex arrival model using linear space, improving upon previously best known convergence.
We consider the problem of statistical inference for a class of partially-observed diffusion processes, with discretely-observed data and finite-dimensional parameters. We construct unbiased estimators of the score function, i.e. the gradient of the log-likelihood function with respect to parameters, with no time-discretization bias. These estimators can be straightforwardly employed within stochastic gradient methods to perform maximum likelihood estimation or Bayesian inference. As our proposed methodology only requires access to a time-discretization scheme such as the Euler-Maruyama method, it is applicable to a wide class of diffusion processes and observation models. Our approach is based on a representation of the score as a smoothing expectation using Girsanov theorem, and a novel adaptation of the randomization schemes developed in Mcleish [2011], Rhee and Glynn [2015], Jacob et al. [2020a]. This allows one to remove the time-discretization bias and burn-in bias when computing smoothing expectations using the conditional particle filter of Andrieu et al. [2010]. Central to our approach is the development of new couplings of multiple conditional particle filters. We prove under assumptions that our estimators are unbiased and have finite variance. The methodology is illustrated on several challenging applications from population ecology and neuroscience.
Streaming codes are a class of packet-level erasure codes that ensure packet recovery over a sliding window channel which allows either a burst erasure of size $b$ or $a$ random erasures within any window of size $(\tau+1)$ time units, under a strict decoding-delay constraint $\tau$. The field size over which streaming codes are constructed is an important factor determining the complexity of implementation. The best known explicit rate-optimal streaming code requires a field size of $q^2$ where $q \ge \tau+b-a$ is a prime power. In this work, we present an explicit rate-optimal streaming code, for all possible $\{a,b,\tau\}$ parameters, over a field of size $q^2$ for prime power $q \ge \tau$. This is the smallest-known field size of a general explicit rate-optimal construction that covers all $\{a,b,\tau\}$ parameter sets. We achieve this by modifying the non-explicit code construction due to Krishnan et al. to make it explicit, without change in field size.
A fringe subtree of a rooted tree is a subtree induced by one of the vertices and all its descendants. We consider the problem of estimating the number of distinct fringe subtrees in two types of random trees: simply generated trees and families of increasing trees (recursive trees, $d$-ary increasing trees and generalized plane-oriented recursive trees). We prove that the order of magnitude of the number of distinct fringe subtrees (under rather mild assumptions on what `distinct' means) in random trees with $n$ vertices is $n/\sqrt{\log n}$ for simply generated trees and $n/\log n$ for increasing trees.
Enumerating matchings is a classical problem in the field of enumeration algorithms. There are polynomial-delay enumeration algorithms for several settings, such as enumerating perfect matchings, maximal matchings, and (weighted) matchings in specific orders. In this paper, we present polynomial-delay enumeration algorithms for maximal matchings with cardinality at least given threshold $t$. Our algorithm enumerates all such matchings in $O(nm)$ delay with exponential space, where $n$ and $m$ are the number of vertices and edges of an input graph, respectively. We also present a polynomial-delay and polynomial-space enumeration algorithm for this problem. As a variant of this algorithm, we give an algorithm that enumerates all maximal matchings in non-decreasing order of its cardinality and runs in $O(nm)$ delay.
We study the classical problem of moment estimation of an underlying vector whose $n$ coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order insertion-only model, then there exist space efficient algorithms with improved dependencies on the approximation parameter $\varepsilon$. In particular, for any real $p > 2$, we first obtain an algorithm for $F_p$ moment estimation using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of memory. Our techniques also give algorithms for $F_p$ moment estimation with $p>2$ on arbitrary order insertion-only and turnstile streams, using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of space and two passes, which is the first optimal multi-pass $F_p$ estimation algorithm up to $\log n$ factors. Finally, we give an improved lower bound of $\Omega\left(\frac{1}{\varepsilon^2}\cdot n^{1-2/p}\right)$ for one-pass insertion-only streams. Our results separate the complexity of this problem both between random and non-random orders, as well as one-pass and multi-pass streams.
Given a graph $G=(V,E)$ and for each vertex $v \in V$ a subset $B(v)$ of the set $\{0,1,\ldots, d_G(v)\}$ a $B$-matching of $G$ is any set $F \subseteq E$ such that $d_F(v) \in B(v)$ for each vertex $v$. The general matching problem asks the existence of a $B$-matching in a given graph. A set $B(v)$ is said to have a {\em gap of length} $p$ if there exists a number $k \in B(v)$ such that $k+1, \ldots, k+p \notin B(v)$ and $k+p+1 \in B(v)$. Without any restrictions the general matching problem is NP-complete. However, if no set $B(v)$ contains a gap of length greater than $1$, then the problem can be solved in polynomial time and Cornuejols \cite{Cor} presented an algorithm for finding a $B$-matching, if it exists. In this paper we consider a version of the general matching problem, in which we are interested in finding a $B$-matching having a maximum (or minimum) number of edges. We present the first polynomial time algorithm for the maximum weight $B$-matching for the case when no set $B(v)$ contains a gap of length greater than $1$.
We present a new algorithm for approximating the number of triangles in a graph $G$ whose edges arrive as an arbitrary order stream. If $m$ is the number of edges in $G$, $T$ the number of triangles, $\Delta_E$ the maximum number of triangles which share a single edge, and $\Delta_V$ the maximum number of triangles which share a single vertex, then our algorithm requires space: \[ \widetilde{O}\left(\frac{m}{T}\cdot \left(\Delta_E + \sqrt{\Delta_V}\right)\right) \] Taken with the $\Omega\left(\frac{m \Delta_E}{T}\right)$ lower bound of Braverman, Ostrovsky, and Vilenchik (ICALP 2013), and the $\Omega\left( \frac{m \sqrt{\Delta_V}}{T}\right)$ lower bound of Kallaugher and Price (SODA 2017), our algorithm is optimal up to log factors, resolving the complexity of a classic problem in graph streaming.
An ordering constraint satisfaction problem (OCSP) is given by a positive integer $k$ and a constraint predicate $\Pi$ mapping permutations on $\{1,\ldots,k\}$ to $\{0,1\}$. Given an instance of OCSP$(\Pi)$ on $n$ variables and $m$ constraints, the goal is to find an ordering of the $n$ variables that maximizes the number of constraints that are satisfied, where a constraint specifies a sequence of $k$ distinct variables and the constraint is satisfied by an ordering on the $n$ variables if the ordering induced on the $k$ variables in the constraint satisfies $\Pi$. Natural ordering constraint satisfaction problems include "Maximum acyclic subgraph (MAS)" and "Betweenness". In this work we consider the task of approximating the maximum number of satisfiable constraints in the (single-pass) streaming setting, where an instance is presented as a stream of constraints. We show that for every $\Pi$, algorithms using $o(\sqrt{n})$ space cannot distinguish streams where almost every constraint is satisfiable from streams where no ordering beats the random ordering by a noticeable amount. In the case of MAS our result shows that for every $\epsilon>0$, MAS is not $1/2+\epsilon$-approximable. The previous best inapproximability result only ruled out a $3/4$ approximation. Our results build on a recent work of Chou, Golovnev, Sudan, and Velusamy who show tight inapproximability results for some constraint satisfaction problems over arbitrary (finite) alphabets. We show that the hard instances from this earlier work have the property that in every partition of the hypergraph formed by the constraints into small blocks, most of the hyperedges are incident on vertices from distinct blocks. By exploiting this combinatorial property, in combination with a natural reduction from CSPs over large finite alphabets to OCSPs, we give optimal inapproximability results for all OCSPs.
This paper considers the data association problem for multi-target tracking. Multiple hypothesis tracking is a popular algorithm for solving this problem but it is NP-hard and is is quite complicated for a large number of targets or for tracking maneuvering targets. To improve tracking performance and enhance robustness, we propose a randomized multiple model multiple hypothesis tracking method, which has three distinctive advantages. First, it yields a randomized data association solution which maximizes the expectation of the logarithm of the posterior probability and can be solved efficiently by linear programming. Next, the state estimation performance is improved by the random coefficient matrices Kalman filter, which mitigates the difficulty introduced by randomized data association, i.e., where the coefficient matrices of the dynamic system are random. Third, the probability that the target follows a specific dynamic model is derived by jointly optimizing the multiple possible models and data association hypotheses, and it does not require prior mode transition probabilities. Thus, it is more robust for tracking multiple maneuvering targets. Simulations demonstrate the efficiency and superior results of the proposed algorithm over interacting multiple model multiple hypothesis tracking.
A constraint satisfaction problem (CSP), Max-CSP$({\cal F})$, is specified by a finite set of constraints ${\cal F} \subseteq \{[q]^k \to \{0,1\}\}$ for positive integers $q$ and $k$. An instance of the problem on $n$ variables is given by $m$ applications of constraints from ${\cal F}$ to subsequences of the $n$ variables, and the goal is to find an assignment to the variables that satisfies the maximum number of constraints. In the $(\gamma,\beta)$-approximation version of the problem for parameters $0 \leq \beta < \gamma \leq 1$, the goal is to distinguish instances where at least $\gamma$ fraction of the constraints can be satisfied from instances where at most $\beta$ fraction of the constraints can be satisfied. In this work we consider the approximability of this problem in the context of streaming algorithms and give a dichotomy result in the dynamic setting, where constraints can be inserted or deleted. Specifically, for every family ${\cal F}$ and every $\beta < \gamma$, we show that either the approximation problem is solvable with polylogarithmic space in the dynamic setting, or not solvable with $o(\sqrt{n})$ space. We also establish tight inapproximability results for a broad subclass in the streaming insertion-only setting. Our work builds on, and significantly extends previous work by the authors who consider the special case of Boolean variables ($q=2$), singleton families ($|{\cal F}| = 1$) and where constraints may be placed on variables or their negations. Our framework extends non-trivially the previous work allowing us to appeal to richer norm estimation algorithms to get our algorithmic results. For our negative results we introduce new variants of the communication problems studied in the previous work, build new reductions for these problems, and extend the technical parts of previous works.