{mayi_des}
Motivated by the planarization of 2-layered straight-line drawings, we consider the problem of modifying a graph such that the resulting graph has pathwidth at most 1. The problem Pathwidth-One Vertex Explosion (POVE) asks whether such a graph can be obtained using at most $k$ vertex explosions, where a vertex explosion replaces a vertex $v$ by deg$(v)$ degree-1 vertices, each incident to exactly one edge that was originally incident to $v$. For POVE, we give an FPT algorithm with running time $O(4^k \cdot m)$ and an $O(k^2)$ kernel, thereby improving over the $O(k^6)$-kernel by Ahmed et al. [GD 22] in a more general setting. Similarly, a vertex split replaces a vertex $v$ by two distinct vertices $v_1$ and $v_2$ and distributes the edges originally incident to $v$ arbitrarily to $v_1$ and $v_2$. Analogously to POVE, we define the problem variant Pathwidth-One Vertex Splitting (POVS) that uses the split operation instead of vertex explosions. Here we obtain a linear kernel and an algorithm with running time $O((6k+12)^k \cdot m)$. This answers an open question by Ahmed et al. [GD22]. Finally, we consider the problem $\Pi$ Vertex Splitting ($\Pi$-VS), which generalizes the problem POVS and asks whether a given graph can be turned into a graph of a specific graph class $\Pi$ using at most $k$ vertex splits. For graph classes $\Pi$ that can be tested in monadic second-order graph logic (MSO$_2$), we show that the problem $\Pi$-VS can be expressed as an MSO$_2$ formula, resulting in an FPT algorithm for $\Pi$-VS parameterized by $k$ if $\Pi$ additionally has bounded treewidth. We obtain the same result for the problem variant using vertex explosions.
Motivated by polynomial approximations of differential forms, we study analytical and numerical properties of a polynomial interpolation problem that relies on function averages over interval segments. The usage of segment data gives rise to new theoretical and practical aspects that distinguish this problem considerably from classical nodal interpolation. We will analyse fundamental mathematical properties of this problem as existence, uniqueness and numerical conditioning of its solution. We will provide concrete conditions for unisolvence, explicit Lagrange-type basis systems for its representation, and a numerical method for its solution. To study the numerical conditioning, we will provide concrete bounds of the Lebesgue constant in a few distinguished cases.
We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators. We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.
We study methods to manipulate weights in stress-graph embeddings to improve convex straight-line planar drawings of 3-connected planar graphs. Stress-graph embeddings are weighted versions of Tutte embeddings, where solving a linear system places vertices at a minimum-energy configuration for a system of springs. A major drawback of the unweighted Tutte embedding is that it often results in drawings with exponential area. We present a number of approaches for choosing better weights. One approach constructs weights (in linear time) that uniformly spread all vertices in a chosen direction, such as parallel to the $x$- or $y$-axis. A second approach morphs $x$- and $y$-spread drawings to produce a more aesthetically pleasing and uncluttered drawing. We further explore a "kaleidoscope" paradigm for this $xy$-morph approach, where we rotate the coordinate axes so as to find the best spreads and morphs. A third approach chooses the weight of each edge according to its depth in a spanning tree rooted at the outer vertices, such as a Schnyder wood or BFS tree, in order to pull vertices closer to the boundary.
Graph clustering, which aims to divide the nodes in the graph into several distinct clusters, is a fundamental and challenging task. In recent years, deep graph clustering methods have been increasingly proposed and achieved promising performance. However, the corresponding survey paper is scarce and it is imminent to make a summary in this field. From this motivation, this paper makes the first comprehensive survey of deep graph clustering. Firstly, the detailed definition of deep graph clustering and the important baseline methods are introduced. Besides, the taxonomy of deep graph clustering methods is proposed based on four different criteria including graph type, network architecture, learning paradigm, and clustering method. In addition, through the careful analysis of the existing works, the challenges and opportunities from five perspectives are summarized. At last, the applications of deep graph clustering in four domains are presented. It is worth mentioning that a collection of state-of-the-art deep graph clustering methods including papers, codes, and datasets is available on GitHub. We hope this work will serve as a quick guide and help researchers to overcome challenges in this vibrant field.
Recently, graph neural networks have been gaining a lot of attention to simulate dynamical systems due to their inductive nature leading to zero-shot generalizability. Similarly, physics-informed inductive biases in deep-learning frameworks have been shown to give superior performance in learning the dynamics of physical systems. There is a growing volume of literature that attempts to combine these two approaches. Here, we evaluate the performance of thirteen different graph neural networks, namely, Hamiltonian and Lagrangian graph neural networks, graph neural ODE, and their variants with explicit constraints and different architectures. We briefly explain the theoretical formulation highlighting the similarities and differences in the inductive biases and graph architecture of these systems. We evaluate these models on spring, pendulum, gravitational, and 3D deformable solid systems to compare the performance in terms of rollout error, conserved quantities such as energy and momentum, and generalizability to unseen system sizes. Our study demonstrates that GNNs with additional inductive biases, such as explicit constraints and decoupling of kinetic and potential energies, exhibit significantly enhanced performance. Further, all the physics-informed GNNs exhibit zero-shot generalizability to system sizes an order of magnitude larger than the training system, thus providing a promising route to simulate large-scale realistic systems.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.