Recently, the second and third author showed that complete geometric graphs on $2n$ vertices in general cannot be partitioned into $n$ plane spanning trees. Building up on this work, in this paper, we initiate the study of partitioning into beyond planar subgraphs, namely into $k$-planar and $k$-quasi-planar subgraphs and obtain first bounds on the number of subgraphs required in this setting.
A homomorphism from a graph $G$ to a graph $H$ is an edge-preserving mapping from $V(G)$ to $V(H)$. Let $H$ be a fixed graph with possible loops. In the list homomorphism problem, denoted by \textsc{LHom}($H$), the instance is a graph $G$, whose every vertex is equipped with a subset of $V(H)$, called list. We ask whether there exists a homomorphism from $G$ to $H$, such that every vertex from $G$ is mapped to a vertex from its list. We study the complexity of the \textsc{LHom}($H$) problem in intersection graphs of various geometric objects. In particular, we are interested in answering the question for what graphs $H$ and for what types of geometric objects, the \textsc{LHom}($H$) problem can be solved in time subexponential in the number of vertices of the instance. We fully resolve this question for string graphs, i.e., intersection graphs of continuous curves in the plane. Quite surprisingly, it turns out that the dichotomy exactly coincides with the analogous dichotomy for graphs excluding a fixed path as an induced subgraph [Okrasa, Rz\k{a}\.zewski, STACS 2021]. Then we turn our attention to subclasses of string graphs, defined as intersections of fat objects. We observe that the (non)existence of subexponential-time algorithms in such classes is closely related to the size $\mathrm{mrc}(H)$ of a maximum reflexive clique in $H$, i.e., maximum number of pairwise adjacent vertices, each of which has a loop. We study the maximum value of $\mathrm{mrc}(H)$ that guarantees the existence of a subexponential-time algorithm for \textsc{LHom}($H$) in intersection graphs of (i) convex fat objects, (ii) fat similarly-sized objects, and (iii) disks. In the first two cases we obtain optimal results, by giving matching algorithms and lower bounds. Finally, we discuss possible extensions of our results to weighted generalizations of \textsc{LHom}($H$).
Listing dense subgraphs in large graphs plays a key task in varieties of network analysis applications like community detection. Clique, as the densest model, has been widely investigated. However, in practice, communities rarely form as cliques for various reasons, e.g., data noise. Therefore, $k$-plex, -- graph with each vertex adjacent to all but at most $k$ vertices, is introduced as a relaxed version of clique. Often, to better simulate cohesive communities, an emphasis is placed on connected $k$-plexes with small $k$. In this paper, we continue the research line of listing all maximal $k$-plexes and maximal $k$-plexes of prescribed size. Our first contribution is algorithm \emph{ListPlex} that lists all maximal $k$-plexes in $O^*(\gamma^D)$ time for each constant $k$, where $\gamma$ is a value related to $k$ but strictly smaller than 2, and $D$ is the degeneracy of the graph that is far less than the vertex number $n$ in real-word graphs. Compared to the trivial bound of $2^n$, the improvement is significant, and our bound is better than all previously known results. In practice, we further use several techniques to accelerate listing $k$-plexes of a given size, such as structural-based prune rules, cache-efficient data structures, and parallel techniques. All these together result in a very practical algorithm. Empirical results show that our approach outperforms the state-of-the-art solutions by up to orders of magnitude.
Beame et al. [ITCS 2018 & TALG 2021] introduced and used the Bipartite Independent Set (BIS) and Independent Set (IS) oracle access to an unknown, simple, unweighted and undirected graph and solved the edge estimation problem. The introduction of this oracle set forth a series of works in a short span of time that either solved open questions mentioned by Beame et al. or were generalizations of their work as in Dell and Lapinskas [STOC 2018], Dell, Lapinskas and Meeks [SODA 2020], Bhattacharya et al. [ISAAC 2019 & Theory Comput. Syst. 2021], and Chen et al. [SODA 2020]. Edge estimation using BIS can be done using polylogarithmic queries, while IS queries need sub-linear but more than polylogarithmic queries. Chen et al. improved Beame et al.'s upper bound result for edge estimation using IS and also showed an almost matching lower bound. Beame et al. in their introductory work asked a few open questions out of which one was on estimating structures of higher order than edges, like triangles and cliques, using BIS queries. Motivated by this question, we completely resolve the query complexity of estimating triangles using BIS oracle. While doing so, we prove a lower bound for an even stronger query oracle called Edge Emptiness (EE) oracle, recently introduced by Assadi, Chakrabarty and Khanna [ESA 2021] to test graph connectivity.
The generalized coloring numbers of Kierstead and Yang offer an algorithmically useful characterization of graph classes with bounded expansion. In this work, we consider the hardness and approximability of these parameters. First, we complete the work of Grohe et al. by showing that computing the weak 2-coloring number is NP-hard. Our approach further establishes that determining the weak $r$-coloring number is APX-hard for all $r \geq 2$. We adapt this to the $r$-coloring number as well, proving APX-hardness for all $r \geq 2$. Our reductions also imply that for every fixed $r \geq 2$, no XP algorithm (runtime $O(n^{f(k)})$) exists for testing if either generalized coloring number is at most $k$. Finally, we give an approximation algorithm for the $r$-coloring number which improves both the runtime and approximation factor of the existing approach of Dvo\v{r}\'{a}k. Our algorithm greedily orders vertices with small enough $\ell$-reach for every $\ell \leq r$ and achieves an $O(C_{r-1} k^{r-1})$-approximation, where $C_i$ is the $i$th Catalan number.
Graph machine learning has been extensively studied in both academia and industry. Although booming with a vast number of emerging methods and techniques, most of the literature is built on the I.I.D. hypothesis, i.e., testing and training graph data are independent and identically distributed. However, this I.I.D. hypothesis can hardly be satisfied in many real-world graph scenarios where the model performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this critical problem, out-of-distribution (OOD) generalization on graphs, which goes beyond the I.I.D. hypothesis, has made great progress and attracted ever-increasing attention from the research community. In this paper, we comprehensively survey OOD generalization on graphs and present a detailed review of recent advances in this area. First, we provide a formal problem definition of OOD generalization on graphs. Second, we categorize existing methods into three classes from conceptually different perspectives, i.e., data, model, and learning strategy, based on their positions in the graph machine learning pipeline, followed by detailed discussions for each category. We also review the theories related to OOD generalization on graphs and introduce the commonly used graph datasets for thorough evaluations. Last but not least, we share our insights on future research directions. This paper is the first systematic and comprehensive review of OOD generalization on graphs, to the best of our knowledge.
We consider the problem of partitioning a graph into a non-fixed number of non-overlapping subgraphs of maximum density. The density of a partition is the sum of the densities of the subgraphs, where the density of a subgraph is its average degree, that is, the ratio of its number of edges and its number of vertices. This problem, called Dense Graph Partition, is known to be NP-hard on general graphs and polynomial-time solvable on trees, and polynomial-time 2-approximable. In this paper we study the restriction of Dense Graph Partition to particular sparse and dense graph classes. In particular, we prove that it is NP-hard on dense bipartite graphs as well as on cubic graphs. On dense graphs on $n$ vertices, it is polynomial-time solvable on graphs with minimum degree $n-3$ and NP-hard on $(n-4)$-regular graphs. We prove that it is polynomial-time $4/3$-approximable on cubic graphs and admits an efficient polynomial-time approximation scheme on graphs of minimum degree $n-t$ for any constant $t\geq 4$.
Recently geometric hypergraphs that can be defined by intersections of pseudohalfplanes with a finite point set were defined in a purely combinatorial way. This led to extensions of earlier results about points and halfplanes to pseudohalfplanes, including polychromatic colorings and discrete Helly-type theorems about pseudohalfplanes. Here we continue this line of research and introduce the notion of convex sets of such pseudohalfplane hypergraphs. In this context we prove several results corresponding to classical results about convexity, namely Helly Theorem, Carath\'eodory's Theorem, Kirchberger's Theorem, Separation Theorem, Radon's Theorem and the Cup-Cap Theorem. These results imply the respective results about pseudoconvex sets in the plane defined using pseudohalfplanes. It turns out that most of our results can be also proved using oriented matroids and topological affine planes (TAPs) but our approach is different from both of them. Compared to oriented matroids, our theory is based on a linear ordering of the vertex set which makes our definitions and proofs quite different and perhaps more elementary. Compared to TAPs, which are continuous objects, our proofs are purely combinatorial and again quite different in flavor. Altogether, we believe that our new approach can further our understanding of these fundamental convexity results.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.
Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remedy this issue, we propose Geometric Graph Convolutional Network (geo-GCN) which uses spatial features to efficiently learn from graphs that can be naturally located in space. Our contribution is threefold: we propose a GCN-inspired architecture which (i) leverages node positions, (ii) is a proper generalisation of both GCNs and Convolutional Neural Networks (CNNs), (iii) benefits from augmentation which further improves the performance and assures invariance with respect to the desired properties. Empirically, geo-GCN outperforms state-of-the-art graph-based methods on image classification and chemical tasks.