Pattern matching on graphs has been widely studied lately due to its importance in genomics applications. Unfortunately, even the simplest problem of deciding if a string appears as a subpath of a graph admits a quadratic lower bound under the Orthogonal Vectors Hypothesis (Equi et al. ICALP 2019, SOFSEM 2021). To avoid this bottleneck, the research has shifted towards more specific graph classes, e.g. those induced from multiple sequence alignments (MSAs). Consider segmenting $\mathsf{MSA}[1..m,1..n]$ into $b$ blocks $\mathsf{MSA}[1..m,1..j_1]$, $\mathsf{MSA}[1..m,j_1+1..j_2]$, $\ldots$, $\mathsf{MSA}[1..m,j_{b-1}+1..n]$. The distinct strings in the rows of the blocks, after the removal of gap symbols, form the nodes of an elastic founder graph (EFG) where the edges represent the original connections observed in the MSA. An EFG is called indexable if a node label occurs as a prefix of only those paths that start from a node of the same block. Equi et al. (ISAAC 2021) showed that such EFGs support fast pattern matching and gave an $O(mn \log m)$-time algorithm for preprocessing the MSA in a way that allows the construction of indexable EFGs maximizing the number of blocks and, alternatively, minimizing the maximum length of a block, in $O(n)$ and $O(n \log\log n)$ time respectively. Using the suffix tree and solving a novel ancestor problem on trees, we improve the preprocessing to $O(mn)$ time and the $O(n \log \log n)$-time EFG construction to $O(n)$ time, thus showing that both types of indexable EFGs can be constructed in time linear in the input size.
The approximate uniform sampling of graph realizations with a given degree sequence is an everyday task in several social science, computer science, engineering etc. projects. One approach is using Markov chains. The best available current result about the well-studied switch Markov chain is that it is rapidly mixing on P-stable degree sequences (see DOI:10.1016/j.ejc.2021.103421). The switch Markov chain does not change any degree sequence. However, there are cases where degree intervals are specified rather than a single degree sequence. (A natural scenario where this problem arises is in hypothesis testing on social networks that are only partially observed.) Rechner, Strowick, and M\"uller-Hannemann introduced in 2018 the notion of degree interval Markov chain which uses three (separately well-studied) local operations (switch, hinge-flip and toggle), and employing on degree sequence realizations where any two sequences under scrutiny have very small coordinate-wise distance. Recently Amanatidis and Kleer published a beautiful paper (arXiv:2110.09068), showing that the degree interval Markov chain is rapidly mixing if the sequences are coming from a system of very thin intervals which are centered not far from a regular degree sequence. In this paper we extend substantially their result, showing that the degree interval Markov chain is rapidly mixing if the intervals are centred at P-stable degree sequences.
Graph Convolutional Networks (GCNs) are one of the most popular architectures that are used to solve classification problems accompanied by graphical information. We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. We study these effects through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model. First, we show that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of at least $1/\sqrt[4]{\mathbb{E}{\rm deg}}$, where $\mathbb{E}{\rm deg}$ denotes the expected degree of a node. Second, we show that with a slightly stronger graph density, two graph convolutions improve this factor to at least $1/\sqrt[4]{n}$, where $n$ is the number of nodes in the graph. Finally, we provide both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a network, concluding that the performance is mutually similar for all combinations of the placement. We present extensive experiments on both synthetic and real-world data that illustrate our results.
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players' actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm Nash-UCRL based on the principle "Optimism-in-Face-of-Uncertainty". Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally efficient. Specifically, we show that Nash-UCRL can provably achieve an $\tilde{O}(dH\sqrt{T})$ regret, where $d$ is the linear function dimension, $H$ is the length of the game and $T$ is the total number of steps in the game. To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.
A partial orientation $\vec{H}$ of a graph $G$ is a weak $r$-guidance system if for any two vertices at distance at most $r$ in $G$, there exists a shortest path $P$ between them such that $\vec{H}$ directs all but one edge in $P$ towards this edge. In case $\vec{H}$ has bounded maximum outdegree, this gives an efficient representation of shortest paths of length at most $r$ in $G$. We show that graphs from many natural graph classes admit such weak guidance systems, and study the algorithmic aspects of this notion.
We describe a polynomial-time algorithm which, given a graph $G$ with treewidth $t$, approximates the pathwidth of $G$ to within a ratio of $O(t\sqrt{\log t})$. This is the first algorithm to achieve an $f(t)$-approximation for some function $f$. Our approach builds on the following key insight: every graph with large pathwidth has large treewidth or contains a subdivision of a large complete binary tree. Specifically, we show that every graph with pathwidth at least $th+2$ has treewidth at least $t$ or contains a subdivision of a complete binary tree of height $h+1$. The bound $th+2$ is best possible up to a multiplicative constant. This result was motivated by, and implies (with $c=2$), the following conjecture of Kawarabayashi and Rossman (SODA'18): there exists a universal constant $c$ such that every graph with pathwidth $\Omega(k^c)$ has treewidth at least $k$ or contains a subdivision of a complete binary tree of height $k$. Our main technical algorithm takes a graph $G$ and some (not necessarily optimal) tree decomposition of $G$ of width $t'$ in the input, and it computes in polynomial time an integer $h$, a certificate that $G$ has pathwidth at least $h$, and a path decomposition of $G$ of width at most $(t'+1)h+1$. The certificate is closely related to (and implies) the existence of a subdivision of a complete binary tree of height $h$. The approximation algorithm for pathwidth is then obtained by combining this algorithm with the approximation algorithm of Feige, Hajiaghayi, and Lee (STOC'05) for treewidth.
While algorithms for planar graphs have received a lot of attention, few papers have focused on the additional power that one gets from assuming an embedding of the graph is available. While in the classic sequential setting, this assumption gives no additional power (as a planar graph can be embedded in linear time), we show that this is far from being the case in other settings. We assume that the embedding is straight-line, but our methods also generalize to non-straight-line embeddings. Specifically, we focus on sublinear-time computation and massively parallel computation (MPC). Our main technical contribution is a sublinear-time algorithm for computing a relaxed version of an $r$-division. We then show how this can be used to estimate Lipschitz additive graph parameters. This includes, for example, the maximum matching, maximum independent set, or the minimum dominating set. We also show how this can be used to solve some property testing problems with respect to the vertex edit distance. In the second part of our paper, we show an MPC algorithm that computes an $r$-division of the input graph. We show how this can be used to solve various classical graph problems with space per machine of $O(n^{2/3+\epsilon})$ for some $\epsilon>0$, and while performing $O(1)$ rounds. This includes for example approximate shortest paths or the minimum spanning tree. Our results also imply an improved MPC algorithm for Euclidean minimum spanning tree.
We present a sheaf-theoretic construction of shape space -- the space of all shapes. We do this by describing a homotopy sheaf on the poset category of constructible sets, where each set is mapped to its Persistent Homology Transform (PHT). Recent results that build on fundamental work of Schapira have shown that this transform is injective, thus making the PHT a good summary object for each shape. Our homotopy sheaf result allows us to "glue" PHTs of different shapes together to build up the PHT of a larger shape. In the case where our shape is a polyhedron we prove a generalized nerve lemma for the PHT. Finally, by re-examining the sampling result of Smale-Niyogi-Weinberger, we show that we can reliably approximate the PHT of a manifold by a polyhedron up to arbitrary precision.
Category theory can be used to state formulas in First-Order Logic without using set membership. Several notable results in logic such as proof of the continuum hypothesis can be elegantly rewritten in category theory. We propose in this paper a reformulation of the usual set-theoretical semantics of the description logic $\mathcal{ALC}$ by using categorical language. In this setting, ALC concepts are represented as objects, concept subsumptions as arrows, and memberships as logical quantifiers over objects and arrows of categories. Such a category-theoretical semantics provides a more modular representation of the semantics of $\mathcal{ALC}$ and a new way to design algorithms for reasoning.
For any small positive real $\varepsilon$ and integer $t > \frac{1}{\varepsilon}$, we build a graph with a vertex deletion set of size $t$ to a tree, and twin-width greater than $2^{(1-\varepsilon) t}$. In particular, this shows that the twin-width is sometimes exponential in the treewidth, in the so-called oriented twin-width and grid number, and that adding an apex may multiply the twin-width by at least $2-\varepsilon$. Except for the one in oriented twin-width, these lower bounds are essentially tight.
In a sports competition, a team might lose a powerful incentive to exert full effort if its final rank does not depend on the outcome of the matches still to be played. Therefore, the organiser should reduce the probability of such a situation to the extent possible. Our paper provides a classification scheme to identify these weakly (where one team is indifferent) or strongly (where both teams are indifferent) stakeless games. A statistical model is estimated to simulate the UEFA Champions League groups and compare the candidate schedules used in the 2021/22 season according to the competitiveness of the matches played in the last round(s). The option followed in four of the eight groups is found to be optimal under a wide set of parameters. Minimising the number of strongly stakeless matches is verified to be a likely goal in the computer draw of the fixture that remains hidden from the public.