We study the graph alignment problem over two independent Erd\H{o}s-R\'enyi graphs on $n$ vertices, with edge density $p$ falling into two regimes separated by the critical window around $p_c=\sqrt{\log n/n}$. Our result reveals an algorithmic phase transition for this random optimization problem: polynomial-time approximation schemes exist in the sparse regime, while statistical-computational gap emerges in the dense regime. Additionally, we establish a sharp transition on the performance of online algorithms for this problem when $p$ lies in the dense regime, resulting in a $\sqrt{8/9}$ multiplicative constant factor gap between achievable and optimal solutions.
A $k$-stack layout (or $k$-page book embedding) of a graph consists of a total order of the vertices, and a partition of the edges into $k$ sets of non-crossing edges with respect to the vertex order. The stack number of a graph is the minimum $k$ such that it admits a $k$-stack layout. In this paper we study a long-standing problem regarding the stack number of planar directed acyclic graphs (DAGs), for which the vertex order has to respect the orientation of the edges. We investigate upper and lower bounds on the stack number of several families of planar graphs: We improve the constant upper bounds on the stack number of single-source and monotone outerplanar DAGs and of outerpath DAGs, and improve the constant upper bound for upward planar 3-trees. Further, we provide computer-aided lower bounds for upward (outer-) planar DAGs.
Composed image retrieval, a task involving the search for a target image using a reference image and a complementary text as the query, has witnessed significant advancements owing to the progress made in cross-modal modeling. Unlike the general image-text retrieval problem with only one alignment relation, i.e., image-text, we argue for the existence of two types of relations in composed image retrieval. The explicit relation pertains to the reference image & complementary text-target image, which is commonly exploited by existing methods. Besides this intuitive relation, the observations during our practice have uncovered another implicit yet crucial relation, i.e., reference image & target image-complementary text, since we found that the complementary text can be inferred by studying the relation between the target image and the reference image. Regrettably, existing methods largely focus on leveraging the explicit relation to learn their networks, while overlooking the implicit relation. In response to this weakness, We propose a new framework for composed image retrieval, termed dual relation alignment, which integrates both explicit and implicit relations to fully exploit the correlations among the triplets. Specifically, we design a vision compositor to fuse reference image and target image at first, then the resulted representation will serve two roles: (1) counterpart for semantic alignment with the complementary text and (2) compensation for the complementary text to boost the explicit relation modeling, thereby implant the implicit relation into the alignment learning. Our method is evaluated on two popular datasets, CIRR and FashionIQ, through extensive experiments. The results confirm the effectiveness of our dual-relation learning in substantially enhancing composed image retrieval performance.
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern $P[1 .. m]$ on a large repetitive text collection $T[1 .. n]$, which is represented as a (hopefully much smaller) run-length context-free grammar of size $g_{rl}$. We show that the problem can be solved in time $O(m^2 \log^\epsilon n)$, for any constant $\epsilon > 0$, on a data structure of size $O(g_{rl})$. Further, on a locally consistent grammar of size $O(\delta\log\frac{n}{\delta})$, the time decreases to $O(m\log m(\log m + \log^\epsilon n))$. The value $\delta$ is a function of the substring complexity of $T$ and $\Omega(\delta\log\frac{n}{\delta})$ is a tight lower bound on the compressibility of repetitive texts $T$, so our structure has optimal size in terms of $n$ and $\delta$. We extend our results to several related problems, such as finding $k$-MEMs, MUMs, rare MEMs, and applications.
Broadcast protocols enable a set of $n$ parties to agree on the input of a designated sender, even facing attacks by malicious parties. In the honest-majority setting, randomization and cryptography were harnessed to achieve low-communication broadcast with sub-quadratic total communication and balanced sub-linear cost per party. However, comparatively little is known in the dishonest-majority setting. Here, the most communication-efficient constructions are based on Dolev and Strong (SICOMP '83), and sub-quadratic broadcast has not been achieved. On the other hand, the only nontrivial $\omega(n)$ communication lower bounds are restricted to deterministic protocols, or against strong adaptive adversaries that can perform "after the fact" removal of messages. We provide new communication lower bounds in this space, which hold against arbitrary cryptography and setup assumptions, as well as a simple protocol showing near tightness of our first bound. 1) We demonstrate a tradeoff between resiliency and communication for protocols secure against $n-o(n)$ static corruptions. For example, $\Omega(n\cdot {\sf polylog}(n))$ messages are needed when the number of honest parties is $n/{\sf polylog}(n)$; $\Omega(n\sqrt{n})$ messages are needed for $O(\sqrt{n})$ honest parties; and $\Omega(n^2)$ messages are needed for $O(1)$ honest parties. Complementarily, we demonstrate broadcast with $O(n\cdot{\sf polylog}(n))$ total communication facing any constant fraction of static corruptions. 2) Our second bound considers $n/2 + k$ corruptions and a weakly adaptive adversary that cannot remove messages "after the fact." We show that any broadcast protocol within this setting can be attacked to force an arbitrary party to send messages to $k$ other parties. This rules out, for example, broadcast facing 51% corruptions in which all non-sender parties have sublinear communication locality.
A pair $\langle G_0, G_1 \rangle$ of graphs admits a mutual witness proximity drawing $\langle \Gamma_0, \Gamma_1 \rangle$ when: (i) $\Gamma_i$ represents $G_i$, and (ii) there is an edge $(u,v)$ in $\Gamma_i$ if and only if there is no vertex $w$ in $\Gamma_{1-i}$ that is ``too close'' to both $u$ and $v$ ($i=0,1$). In this paper, we consider infinitely many definitions of closeness by adopting the $\beta$-proximity rule for any $\beta \in [1,\infty]$ and study pairs of isomorphic trees that admit a mutual witness $\beta$-proximity drawing. Specifically, we show that every two isomorphic trees admit a mutual witness $\beta$-proximity drawing for any $\beta \in [1,\infty]$. The constructive technique can be made ``robust'': For some tree pairs we can suitably prune linearly many leaves from one of the two trees and still retain their mutual witness $\beta$-proximity drawability. Notably, in the special case of isomorphic caterpillars and $\beta=1$, we construct linearly separable mutual witness Gabriel drawings.
We study the differential privacy (DP) of a core ML problem, linear ordinary least squares (OLS), a.k.a. $\ell_2$-regression. Our key result is that the approximate LS algorithm (ALS) (Sarlos, 2006), a randomized solution to the OLS problem primarily used to improve performance on large datasets, also preserves privacy. ALS achieves a better privacy/utility tradeoff, without modifications or further noising, when compared to alternative private OLS algorithms which modify and/or noise OLS. We give the first {\em tight} DP-analysis for the ALS algorithm and the standard Gaussian mechanism (Dwork et al., 2014) applied to OLS. Our methodology directly improves the privacy analysis of (Blocki et al., 2012) and (Sheffet, 2019)) and introduces new tools which may be of independent interest: (1) the exact spectrum of $(\epsilon, \delta)$-DP parameters (``DP spectrum") for mechanisms whose output is a $d$-dimensional Gaussian, and (2) an improved DP spectrum for random projection (compared to (Blocki et al., 2012) and (Sheffet, 2019)). All methods for private OLS (including ours) assume, often implicitly, restrictions on the input database, such as bounds on leverage and residuals. We prove that such restrictions are necessary. Hence, computing the privacy of mechanisms such as ALS must estimate these database parameters, which can be infeasible in big datasets. For more complex ML models, DP bounds may not even be tractable. There is a need for blackbox DP-estimators (Lu et al., 2022) which empirically estimate a data-dependent privacy. We demonstrate the effectiveness of such a DP-estimator by empirically recovering a DP-spectrum that matches our theory for OLS. This validates the DP-estimator in a nontrivial ML application, opening the door to its use in more complex nonlinear ML settings where theory is unavailable.
We consider the problem of dynamically maintaining the convex hull of a set $S$ of points in the plane under the following special sequence of insertions and deletions (called {\em window-sliding updates}): insert a point to the right of all points of $S$ and delete the leftmost point of $S$. We propose an $O(|S|)$-space data structure that can handle each update in $O(1)$ amortized time, such that standard binary-search-based queries on the convex hull of $S$ can be answered in $O(\log h)$ time, where $h$ is the number of vertices of the convex hull of $S$, and the convex hull itself can be output in $O(h)$ time.
We derive large-sample and other limiting distributions of the ``frequency of frequencies'' vector, ${\bf M_n}$, together with the number of species, $K_n$, in a Poisson-Dirichlet or generalised Poisson-Dirichlet gene or species sampling model. Models analysed include those constructed from gamma and $\alpha$-stable subordinators by Kingman, the two-parameter extension by Pitman and Yor, and another two-parameter version constructed by omitting large jumps from an $\alpha$-stable subordinator. In the Poisson-Dirichlet case ${\bf M_n}$ and $K_n$ turn out to be asymptotically independent, and notable, especially for statistical applications, is that in other cases the conditional limiting distribution of ${\bf M_n}$, given $K_n$, is normal, after certain centering and norming.
Given $k$ input graphs $G_1, \dots ,G_k$, where each pair $G_i$, $G_j$ with $i \neq j$ shares the same graph $G$, the problem Simultaneous Embedding With Fixed Edges (SEFE) asks whether there exists a planar drawing for each input graph such that all drawings coincide on $G$. While SEFE is still open for the case of two input graphs, the problem is NP-complete for $k \geq 3$ [Schaefer, JGAA 13]. In this work, we explore the parameterized complexity of SEFE. We show that SEFE is FPT with respect to $k$ plus the vertex cover number or the feedback edge set number of the the union graph $G^\cup = G_1 \cup \dots \cup G_k$. Regarding the shared graph $G$, we show that SEFE is NP-complete, even if $G$ is a tree with maximum degree 4. Together with a known NP-hardness reduction [Angelini et al., TCS 15], this allows us to conclude that several parameters of $G$, including the maximum degree, the maximum number of degree-1 neighbors, the vertex cover number, and the number of cutvertices are intractable. We also settle the tractability of all pairs of these parameters. We give FPT algorithms for the vertex cover number plus either of the first two parameters and for the number of cutvertices plus the maximum degree, whereas we prove all remaining combinations to be intractable.
In 2013, Pak and Panova proved the strict unimodality property of $q$-binomial coefficients $\binom{\ell+m}{m}_q$ (as polynomials in $q$) based on the combinatorics of Young tableaux and the semigroup property of Kronecker coefficients. They showed it to be true for all $\ell,m\geq 8$ and a few other cases. We propose a different approach to this problem based on computer algebra, where we establish a closed form for the coefficients of these polynomials and then use cylindrical algebraic decomposition to identify exactly the range of coefficients where strict unimodality holds. This strategy allows us to tackle generalizations of the problem, e.g., to show unimodality with larger gaps or unimodality of related sequences. In particular, we present proofs of two additional cases of a conjecture by Stanley and Zanello.