Given an undirected graph $G=(V,E)$, a vertex $v\in V$ is edge-vertex (ev) dominated by an edge $e\in E$ if $v$ is either incident to $e$ or incident to an adjacent edge of $e$. A set $S^{ev}\subseteq E$ is an edge-vertex dominating set (referred to as \textit{ev}-dominating set and in short as \textit{EVDS}) of $G$ if every vertex of $G$ is \textit{ev}-dominated by at least one edge of $S^{ev}$. The minimum cardinality of an \textit{ev}-dominating set is the \textit{ev}-domination number. The edge-vertex dominating set problem is to find a minimum \textit{ev}-domination number. In this paper, we prove that the \textit{ev}-dominating set problem is {\tt NP-hard} on unit disk graphs. We also prove that this problem admits a polynomial-time approximation scheme on unit disk graphs. Finally, we give a simple 5-factor linear-time approximation algorithm.
Given a continuous definable function $f: S \to \mathbb{R}$ on a definable set $S$, we study sublevel sets of the form $S^f_t = \{x \in S: f(x) \leq t\}$ for all $t \in \mathbb{R}$. Using o-minimal structures, we prove that the Euler characteristic of $S^f_t$ is right continuous with respect to $t$. Furthermore, when $S$ is compact, we show that $S^f_{t+\delta}$ deformation retracts to $S^f_t$ for all sufficiently small $\delta > 0$. Applying these results, we also characterize the relationship between the concepts of Euler characteristic transform and smooth Euler characteristic transform in topological data analysis.
Let $\gamma$ be a generic closed curve in the plane. Samuel Blank, in his 1967 Ph.D. thesis, determined if $\gamma$ is self-overlapping by geometrically constructing a combinatorial word from $\gamma$. More recently, Zipei Nie, in an unpublished manuscript, computed the minimum homotopy area of $\gamma$ by constructing a combinatorial word algebraically. We provide a unified framework for working with both words and determine the settings under which Blank's word and Nie's word are equivalent. Using this equivalence, we give a new geometric proof for the correctness of Nie's algorithm. Unlike previous work, our proof is constructive which allows us to naturally compute the actual homotopy that realizes the minimum area. Furthermore, we contribute to the theory of self-overlapping curves by providing the first polynomial-time algorithm to compute a self-overlapping decomposition of any closed curve $\gamma$ with minimum area.
Consider bivariate observations $(X_1,Y_1), \ldots, (X_n,Y_n) \in \mathbb{R}\times \mathbb{R}$ with unknown conditional distributions $Q_x$ of $Y$, given that $X = x$. The goal is to estimate these distributions under the sole assumption that $Q_x$ is isotonic in $x$ with respect to likelihood ratio order. If the observations are identically distributed, a related goal is to estimate the joint distribution $\mathcal{L}(X,Y)$ under the sole assumption that it is totally positive of order two in a certain sense. An algorithm is developed which estimates the unknown family of distributions $(Q_x)_x$ via empirical likelihood. The benefit of the stronger regularization imposed by likelihood ratio order over the usual stochastic order is evaluated in terms of estimation and predictive performances on simulated as well as real data.
The boxicity of a graph is the smallest dimension $d$ allowing a representation of it as the intersection graph of a set of $d$-dimensional axis-parallel boxes. We present a simple general approach to determining the boxicity of a graph based on studying its ``interval-order subgraphs''. The power of the method is first tested on the boxicity of some popular graphs that have resisted previous attempts: the boxicity of the Petersen graph is $3$, and more generally, that of the Kneser-graphs $K(n,2)$ is $n-2$ if $n\ge 5$, confirming a conjecture of Caoduro and Lichev [Discrete Mathematics, Vol. 346, 5, 2023]. Since every line graph is an induced subgraph of the complement of $K(n,2)$, the developed tools show furthermore that line graphs have only a polynomial number of edge-maximal interval-order subgraphs. This opens the way to polynomial-time algorithms for problems that are in general $\mathcal{NP}$-hard: for the existence and optimization of interval-order subgraphs of line-graphs, or of interval-completions of their complement.
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern $P[1 .. m]$ on a large repetitive text collection $T[1 .. n]$, which is represented as a (hopefully much smaller) run-length context-free grammar of size $g_{rl}$. We show that the problem can be solved in time $O(m^2 \log^\epsilon n)$, for any constant $\epsilon > 0$, on a data structure of size $O(g_{rl})$. Further, on a locally consistent grammar of size $O(\delta\log\frac{n}{\delta})$, the time decreases to $O(m\log m(\log m + \log^\epsilon n))$. The value $\delta$ is a function of the substring complexity of $T$ and $\Omega(\delta\log\frac{n}{\delta})$ is a tight lower bound on the compressibility of repetitive texts $T$, so our structure has optimal size in terms of $n$ and $\delta$. We extend our results to several related problems, such as finding $k$-MEMs, MUMs, rare MEMs, and applications.
A pair $\langle G_0, G_1 \rangle$ of graphs admits a mutual witness proximity drawing $\langle \Gamma_0, \Gamma_1 \rangle$ when: (i) $\Gamma_i$ represents $G_i$, and (ii) there is an edge $(u,v)$ in $\Gamma_i$ if and only if there is no vertex $w$ in $\Gamma_{1-i}$ that is ``too close'' to both $u$ and $v$ ($i=0,1$). In this paper, we consider infinitely many definitions of closeness by adopting the $\beta$-proximity rule for any $\beta \in [1,\infty]$ and study pairs of isomorphic trees that admit a mutual witness $\beta$-proximity drawing. Specifically, we show that every two isomorphic trees admit a mutual witness $\beta$-proximity drawing for any $\beta \in [1,\infty]$. The constructive technique can be made ``robust'': For some tree pairs we can suitably prune linearly many leaves from one of the two trees and still retain their mutual witness $\beta$-proximity drawability. Notably, in the special case of isomorphic caterpillars and $\beta=1$, we construct linearly separable mutual witness Gabriel drawings.
We study the differential privacy (DP) of a core ML problem, linear ordinary least squares (OLS), a.k.a. $\ell_2$-regression. Our key result is that the approximate LS algorithm (ALS) (Sarlos, 2006), a randomized solution to the OLS problem primarily used to improve performance on large datasets, also preserves privacy. ALS achieves a better privacy/utility tradeoff, without modifications or further noising, when compared to alternative private OLS algorithms which modify and/or noise OLS. We give the first {\em tight} DP-analysis for the ALS algorithm and the standard Gaussian mechanism (Dwork et al., 2014) applied to OLS. Our methodology directly improves the privacy analysis of (Blocki et al., 2012) and (Sheffet, 2019)) and introduces new tools which may be of independent interest: (1) the exact spectrum of $(\epsilon, \delta)$-DP parameters (``DP spectrum") for mechanisms whose output is a $d$-dimensional Gaussian, and (2) an improved DP spectrum for random projection (compared to (Blocki et al., 2012) and (Sheffet, 2019)). All methods for private OLS (including ours) assume, often implicitly, restrictions on the input database, such as bounds on leverage and residuals. We prove that such restrictions are necessary. Hence, computing the privacy of mechanisms such as ALS must estimate these database parameters, which can be infeasible in big datasets. For more complex ML models, DP bounds may not even be tractable. There is a need for blackbox DP-estimators (Lu et al., 2022) which empirically estimate a data-dependent privacy. We demonstrate the effectiveness of such a DP-estimator by empirically recovering a DP-spectrum that matches our theory for OLS. This validates the DP-estimator in a nontrivial ML application, opening the door to its use in more complex nonlinear ML settings where theory is unavailable.
The $\Sigma$-QMAC problem is introduced, involving $S$ servers, $K$ classical ($\mathbb{F}_d$) data streams, and $T$ independent quantum systems. Data stream ${\sf W}_k, k\in[K]$ is replicated at a subset of servers $\mathcal{W}(k)\subset[S]$, and quantum system $\mathcal{Q}_t, t\in[T]$ is distributed among a subset of servers $\mathcal{E}(t)\subset[S]$ such that Server $s\in\mathcal{E}(t)$ receives subsystem $\mathcal{Q}_{t,s}$ of $\mathcal{Q}_t=(\mathcal{Q}_{t,s})_{s\in\mathcal{E}(t)}$. Servers manipulate their quantum subsystems according to their data and send the subsystems to a receiver. The total download cost is $\sum_{t\in[T]}\sum_{s\in\mathcal{E}(t)}\log_d|\mathcal{Q}_{t,s}|$ qudits, where $|\mathcal{Q}|$ is the dimension of $\mathcal{Q}$. The states and measurements of $(\mathcal{Q}_t)_{t\in[T]}$ are required to be separable across $t\in[T]$ throughout, but for each $t\in[T]$, the subsystems of $\mathcal{Q}_{t}$ can be prepared initially in an arbitrary (independent of data) entangled state, manipulated arbitrarily by the respective servers, and measured jointly by the receiver. From the measurements, the receiver must recover the sum of all data streams. Rate is defined as the number of dits ($\mathbb{F}_d$ symbols) of the desired sum computed per qudit of download. The capacity of $\Sigma$-QMAC, i.e., the supremum of achievable rates is characterized for arbitrary data replication and entanglement distribution maps $\mathcal{W}, \mathcal{E}$. Coding based on the $N$-sum box abstraction is optimal in every case. Notably, for every $S\neq 3$ there exists an instance of the $\Sigma$-QMAC where $S$-party entanglement is necessary to achieve the fully entangled capacity.
We derive large-sample and other limiting distributions of the ``frequency of frequencies'' vector, ${\bf M_n}$, together with the number of species, $K_n$, in a Poisson-Dirichlet or generalised Poisson-Dirichlet gene or species sampling model. Models analysed include those constructed from gamma and $\alpha$-stable subordinators by Kingman, the two-parameter extension by Pitman and Yor, and another two-parameter version constructed by omitting large jumps from an $\alpha$-stable subordinator. In the Poisson-Dirichlet case ${\bf M_n}$ and $K_n$ turn out to be asymptotically independent, and notable, especially for statistical applications, is that in other cases the conditional limiting distribution of ${\bf M_n}$, given $K_n$, is normal, after certain centering and norming.
Given $k$ input graphs $G_1, \dots ,G_k$, where each pair $G_i$, $G_j$ with $i \neq j$ shares the same graph $G$, the problem Simultaneous Embedding With Fixed Edges (SEFE) asks whether there exists a planar drawing for each input graph such that all drawings coincide on $G$. While SEFE is still open for the case of two input graphs, the problem is NP-complete for $k \geq 3$ [Schaefer, JGAA 13]. In this work, we explore the parameterized complexity of SEFE. We show that SEFE is FPT with respect to $k$ plus the vertex cover number or the feedback edge set number of the the union graph $G^\cup = G_1 \cup \dots \cup G_k$. Regarding the shared graph $G$, we show that SEFE is NP-complete, even if $G$ is a tree with maximum degree 4. Together with a known NP-hardness reduction [Angelini et al., TCS 15], this allows us to conclude that several parameters of $G$, including the maximum degree, the maximum number of degree-1 neighbors, the vertex cover number, and the number of cutvertices are intractable. We also settle the tractability of all pairs of these parameters. We give FPT algorithms for the vertex cover number plus either of the first two parameters and for the number of cutvertices plus the maximum degree, whereas we prove all remaining combinations to be intractable.