A conflict-free open neighborhood coloring of a graph is an assignment of colors to the vertices such that for every vertex there is a color that appears exactly once in its open neighborhood. For a graph G, the smallest number of colors required for such a coloring is called the conflict-free open neighborhood (CFON) chromatic number and is denoted by \chi_{ON}(G). Analogously, we define conflict-free closed neighborhood (CFCN) coloring, and CFCN chromatic number (denoted by \chi_{CN}(G)). First studied in 2002, this problem has received considerable attention. We study the CFON and CFCN coloring problems and show the following results. In what follows, \Delta denotes the maximum degree of the graph. 1. For a K_{1, k}-free graph G, we show that \chi_{ON}(G) = O(k \ln\Delta). This improves the bound of O(k^2 \ln \Delta) from [Bhyravarapu, Kalyanasundaram, Mathew, MFCS 2022]. Since \chi_{CN}(G) \leq 2\chi_{ON}(G), our result implies an upper bound on \chi_{CN}(G) as well. It is known that there exist separate classes of graphs with \chi_{ON}(G) = \Omega(\ln\Delta) and \chi_{ON}(G) = \Omega(k). 2. Let f(\delta) be defined as follows: f(\delta) = max {\chi_{CN} (G) : G is a graph with minimum degree \delta}. It is easy to see that f(\delta') \geq f(\delta) when \delta' < \delta. It is known [Debski and Przybylo, JGT 2021] that f(c \Delta) = \Theta(\log \Delta), for any positive constant c. In this paper, we show that f(c\Delta^{1 - \epsilon}) = \Omega (\ln^2 \Delta), where c, \epsilon are positive constants such that \epsilon < 0.75. Together with the known upper bound \chi_{CN}(G) = O(\ln^2 \Delta), this implies that f(c\Delta^{1 - \epsilon}) = \Theta (\ln^2 \Delta). 3. For a K_{1, k}-free graph G on n vertices, we show that \chi_{CN}(G) = O(\ln k \ln n). This bound is asymptotically tight.
A growing line of work shows how learned predictions can be used to break through worst-case barriers to improve the running time of an algorithm. However, incorporating predictions into data structures with strong theoretical guarantees remains underdeveloped. This paper takes a step in this direction by showing that predictions can be leveraged in the fundamental online list labeling problem. In the problem, n items arrive over time and must be stored in sorted order in an array of size Theta(n). The array slot of an element is its label and the goal is to maintain sorted order while minimizing the total number of elements moved (i.e., relabeled). We design a new list labeling data structure and bound its performance in two models. In the worst-case learning-augmented model, we give guarantees in terms of the error in the predictions. Our data structure provides strong guarantees: it is optimal for any prediction error and guarantees the best-known worst-case bound even when the predictions are entirely erroneous. We also consider a stochastic error model and bound the performance in terms of the expectation and variance of the error. Finally, the theoretical results are demonstrated empirically. In particular, we show that our data structure has strong performance on real temporal data sets where predictions are constructed from elements that arrived in the past, as is typically done in a practical use case.
We introduce the Observation Route Problem ($\textsf{ORP}$) defined as follows: Given a set of $n$ pairwise disjoint compact regions in the plane, find a shortest tour (route) such that an observer walking along this tour can see (observe) some point in each region from some point of the tour. The observer does \emph{not} need to see the entire boundary of an object. The tour is \emph{not} allowed to intersect the interior of any region (i.e., the regions are obstacles and therefore out of bounds). The problem exhibits similarity to both the Traveling Salesman Problem with Neighborhoods ($\textsf{TSPN}$) and the External Watchman Route Problem ($\textsf{EWRP}$). We distinguish two variants: the range of visibility is either limited to a bounding rectangle, or unlimited. We obtain the following results: (I) Given a family of $n$ disjoint convex bodies in the plane, computing a shortest observation route does not admit a $(c\log n)$-approximation unless $\textsf{P} = \textsf{NP}$ for an absolute constant $c>0$. (This holds for both limited and unlimited vision.) (II) Given a family of disjoint convex bodies in the plane, computing a shortest external watchman route is $\textsf{NP}$-hard. (This holds for both limited and unlimited vision; and even for families of axis-aligned squares.) (III) Given a family of $n$ disjoint fat convex polygons, an observation tour whose length is at most $O(\log{n})$ times the optimal can be computed in polynomial time. (This holds for limited vision.) (IV) For every $n \geq 5$, there exists a convex polygon with $n$ sides and all angles obtuse such that its perimeter is \emph{not} a shortest external watchman route. This refutes a conjecture by Absar and Whitesides (2006).
This paper studies the online vector bin packing (OVBP) problem and the related problem of online hypergraph coloring (OHC). Firstly, we use a double counting argument to prove an upper bound of the competitive ratio of $FirstFit$ for OVBP. Our proof is conceptually simple, and strengthens the result in Azar et. al. by removing the dependency on the bin size parameter. Secondly, we introduce a notion of an online incidence matrix that is defined for every instance of OHC. Using this notion, we provide a reduction from OHC to OVBP, which allows us to carry known lower bounds of the competitive ratio of algorithms for OHC to OVBP. Our approach significantly simplifies the previous argument from Azar et. al. that relied on using intricate graph structures. In addition, we slightly improve their lower bounds. Lastly, we establish a tight bound of the competitive ratio of algorithms for OHC, where input is restricted to be a hypertree, thus resolving a conjecture in Nagy-Gyorgy et. al. The crux of this proof lies in solving a certain combinatorial partition problem about multi-family of subsets, which might be of independent interest.
Machine learning (ML) models are costly to train as they can require a significant amount of data, computational resources and technical expertise. Thus, they constitute valuable intellectual property that needs protection from adversaries wanting to steal them. Ownership verification techniques allow the victims of model stealing attacks to demonstrate that a suspect model was in fact stolen from theirs. Although a number of ownership verification techniques based on watermarking or fingerprinting have been proposed, most of them fall short either in terms of security guarantees (well-equipped adversaries can evade verification) or computational cost. A fingerprinting technique, Dataset Inference (DI), has been shown to offer better robustness and efficiency than prior methods. The authors of DI provided a correctness proof for linear (suspect) models. However, in a subspace of the same setting, we prove that DI suffers from high false positives (FPs) -- it can incorrectly identify an independent model trained with non-overlapping data from the same distribution as stolen. We further prove that DI also triggers FPs in realistic, non-linear suspect models. We then confirm empirically that DI in the black-box setting leads to FPs, with high confidence. Second, we show that DI also suffers from false negatives (FNs) -- an adversary can fool DI (at the cost of incurring some accuracy loss) by regularising a stolen model's decision boundaries using adversarial training, thereby leading to an FN. To this end, we demonstrate that black-box DI fails to identify a model adversarially trained from a stolen dataset -- the setting where DI is the hardest to evade. Finally, we discuss the implications of our findings, the viability of fingerprinting-based ownership verification in general, and suggest directions for future work.
In epidemiological studies, the capture-recapture (CRC) method is a powerful tool that can be used to estimate the number of diseased cases or potentially disease prevalence based on data from overlapping surveillance systems. Estimators derived from log-linear models are widely applied by epidemiologists when analyzing CRC data. The popularity of the log-linear model framework is largely associated with its accessibility and the fact that interaction terms can allow for certain types of dependency among data streams. In this work, we shed new light on significant pitfalls associated with the log-linear model framework in the context of CRC using real data examples and simulation studies. First, we demonstrate that the log-linear model paradigm is highly exclusionary. That is, it can exclude, by design, many possible estimates that are potentially consistent with the observed data. Second, we clarify the ways in which regularly used model selection metrics (e.g., information criteria) are fundamentally deceiving in the effort to select a best model in this setting. By focusing attention on these important cautionary points and on the fundamental untestable dependency assumption made when fitting a log-linear model to CRC data, we hope to improve the quality of and transparency associated with subsequent surveillance-based CRC estimates of case counts.
Elliptical distribution is a basic assumption underlying many multivariate statistical methods. For example, in sufficient dimension reduction and statistical graphical models, this assumption is routinely imposed to simplify the data dependence structure. Before applying such methods, we need to decide whether the data are elliptically distributed. Currently existing tests either focus exclusively on spherical distributions, or rely on bootstrap to determine the null distribution, or require specific forms of the alternative distribution. In this paper, we introduce a general nonparametric test for elliptical distribution based on kernel embedding of the probability measure that embodies the two properties that characterize an elliptical distribution: namely, after centering and rescaling, (1) the direction and length of the random vector are independent, and (2) the directional vector is uniformly distributed on the unit sphere. We derive the null asymptotic distribution of the test statistic via von-Mises expansion, develop the sample-level procedure to determine the rejection region, and establish the consistency and validity of the proposed test. We apply our test to a SENIC dataset with and without a transformation aimed to achieve ellipticity.
We describe a simple parallel-friendly lightweight graph reordering algorithm for COO graphs (edge lists). Our ``Batched Order By Attachment'' (BOBA) algorithm is linear in the number of edges in terms of reads and linear in the number of vertices for writes through to main memory. It is highly parallelizable on GPUs\@. We show that, compared to a randomized baseline, the ordering produced gives improved locality of reference in sparse matrix-vector multiplication (SpMV) as well as other graph algorithms. Moreover, it can substantially speed up the conversion from a COO representation to the compressed format CSR, a very common workflow. Thus, it can give \emph{end-to-end} speedups even in SpMV\@. Unlike other lightweight approaches, this reordering does not rely on explicitly knowing the degrees of the vertices, and indeed its runtime is comparable to that of computing degrees. Instead, it uses the structure and edge distribution inherent in the input edge list, making it a candidate for default use in a pragmatic graph creation pipeline. This algorithm is suitable for road-type networks as well as scale-free. It improves cache locality on both CPUs and GPUs, achieving hit rates similar to the heavyweight techniques (e.g., for SpMV, 7--52\% and 11--67\% in the L1 and L2 caches, respectively). Compared to randomly labeled graphs, BOBA-reordered graphs achieve end-to-end speedups of up to 3.45. The reordering time is approximately one order of magnitude faster than existing lightweight techniques and up to 2.5 orders of magnitude faster than heavyweight techniques.
A set $D \subseteq V$ of a graph $G=(V, E)$ is a dominating set of $G$ if every vertex $v\in V\setminus D$ is adjacent to at least one vertex in $D.$ A set $S \subseteq V$ is a co-secure dominating set (CSDS) of a graph $G$ if $S$ is a dominating set of $G$ and for each vertex $u \in S$ there exists a vertex $v \in V\setminus S$ such that $uv \in E$ and $(S\setminus \{u\}) \cup \{v\}$ is a dominating set of $G$. The minimum cardinality of a co-secure dominating set of $G$ is the co-secure domination number and it is denoted by $\gamma_{cs}(G)$. Given a graph $G=(V, E)$, the minimum co-secure dominating set problem (Min Co-secure Dom) is to find a co-secure dominating set of minimum cardinality. In this paper, we strengthen the inapproximability result of Min Co-secure Dom for general graphs by showing that this problem can not be approximated within a factor of $(1- \epsilon)\ln |V|$ for perfect elimination bipartite graphs and star convex bipartite graphs unless P=NP. On the positive side, we show that Min Co-secure Dom can be approximated within a factor of $O(\ln |V|)$ for any graph $G$ with $\delta(G)\geq 2$. For $3$-regular and $4$-regular graphs, we show that Min Co-secure Dom is approximable within a factor of $\dfrac{8}{3}$ and $\dfrac{10}{3}$, respectively. Furthermore, we prove that Min Co-secure Dom is APX-complete for $3$-regular graphs.
In recent years, online social networks have been the target of adversaries who seek to introduce discord into societies, to undermine democracies and to destabilize communities. Often the goal is not to favor a certain side of a conflict but to increase disagreement and polarization. To get a mathematical understanding of such attacks, researchers use opinion-formation models from sociology, such as the Friedkin--Johnsen model, and formally study how much discord the adversary can produce when altering the opinions for only a small set of users. In this line of work, it is commonly assumed that the adversary has full knowledge about the network topology and the opinions of all users. However, the latter assumption is often unrealistic in practice, where user opinions are not available or simply difficult to estimate accurately. To address this concern, we raise the following question: Can an attacker sow discord in a social network, even when only the network topology is known? We answer this question affirmatively. We present approximation algorithms for detecting a small set of users who are highly influential for the disagreement and polarization in the network. We show that when the adversary radicalizes these users and if the initial disagreement/polarization in the network is not very high, then our method gives a constant-factor approximation on the setting when the user opinions are known. To find the set of influential users, we provide a novel approximation algorithm for a variant of MaxCut in graphs with positive and negative edge weights. We experimentally evaluate our methods, which have access only to the network topology, and we find that they have similar performance as methods that have access to the network topology and all user opinions. We further present an NP-hardness proof, which was an open question by Chen and Racz [IEEE Trans. Netw. Sci. Eng., 2021].
Despite the growing interest in parallel-in-time methods as an approach to accelerate numerical simulations in atmospheric modelling, improving their stability and convergence remains a substantial challenge for their application to operational models. In this work, we study the temporal parallelization of the shallow water equations on the rotating sphere combined with time-stepping schemes commonly used in atmospheric modelling due to their stability properties, namely an Eulerian implicit-explicit (IMEX) method and a semi-Lagrangian semi-implicit method (SL-SI-SETTLS). The main goal is to investigate the performance of parallel-in-time methods, namely Parareal and Multigrid Reduction in Time (MGRIT), when these well-established schemes are used on the coarse discretization levels and provide insights on how they can be improved for better performance. We begin by performing an analytical stability study of Parareal and MGRIT applied to a linearized ordinary differential equation depending on the choice of coarse scheme. Next, we perform numerical simulations of two standard tests to evaluate the stability, convergence and speedup provided by the parallel-in-time methods compared to a fine reference solution computed serially. We also conduct a detailed investigation on the influence of artificial viscosity and hyperviscosity approaches, applied on the coarse discretization levels, on the performance of the temporal parallelization. Both the analytical stability study and the numerical simulations indicate a poorer stability behaviour when SL-SI-SETTLS is used on the coarse levels, compared to the IMEX scheme. With the IMEX scheme, a better trade-off between convergence, stability and speedup compared to serial simulations can be obtained under proper parameters and artificial viscosity choices, opening the perspective of the potential competitiveness for realistic models.