The class of Galled-Tree Explainable (GaTEx) graphs has just recently been discovered as a natural generalization of cographs. Cographs are precisely those graphs that can be uniquely represented by a rooted tree where the leaves of the tree correspond to the vertices of the graph. As a generalization, GaTEx graphs are precisely those graphs that can be uniquely represented by a particular rooted directed acyclic graph (called galled-tree). We consider here four prominent problems that are, in general, NP-hard: computing the size $\omega(G)$ of a maximum clique, the size $\chi(G)$ of an optimal vertex-coloring and the size $\alpha(G)$ of a maximum independent set of a given graph $G$ as well as determining whether a graph is perfectly orderable. We show here that $\omega(G)$, $\chi(G)$, $\alpha(G)$ can be computed in linear-time for GaTEx graphs $G$. The crucial idea for the linear-time algorithms is to avoid working on the GaTEx graphs $G$ directly, but to use the the galled-trees that explain $G$ as a guide for the algorithms to compute these invariants. In particular, we show first how to employ the galled-tree structure to compute a perfect ordering of GaTEx graphs in linear-time which is then used to determine $\omega(G)$, $\chi(G)$, $\alpha(G)$.
Modern ML predictions models are surprisingly accurate in practice and incorporating their power into algorithms has led to a new research direction. Algorithms with predictions have already been used to improve on worst-case optimal bounds for online problems and for static graph problems. With this work, we initiate the study of the complexity of {\em data structures with predictions}, with an emphasis on dynamic graph problems. Unlike the independent work of v.d.~Brand et al.~[arXiv:2307.09961] that aims at upper bounds, our investigation is focused on establishing conditional fine-grained lower bounds for various notions of predictions. Our lower bounds are conditioned on the Online Matrix Vector (OMv) hypothesis. First we show that a prediction-based algorithm for OMv provides a smooth transition between the known bounds, for the offline and the online setting, and then show that this algorithm is essentially optimal under the OMv hypothesis. Further, we introduce and study four different kinds of predictions. (1) For {\em $\varepsilon$-accurate predictions}, where $\varepsilon \in (0,1)$, we show that any lower bound from the non-prediction setting carries over, reduced by a factor of $1-\varepsilon$. (2) For {\em $L$-list accurate predictions}, we show that one can efficiently compute a $(1/L)$-accurate prediction from an $L$-list accurate prediction. (3) For {\em bounded delay predictions} and {\em bounded delay predictions with outliers}, we show that a lower bound from the non-prediction setting carries over, if the reduction fulfills a certain reordering condition (which is fulfilled by many reductions from OMv for dynamic graph problems). This is demonstrated by showing lower and almost tight upper bounds for a concrete, dynamic graph problem, called $\# s \textrm{-} \triangle$, where the number of triangles that contain a fixed vertex $s$ must be reported.
Lattice-linearity was introduced as modelling problems using predicates that induce a lattice among the global states (Garg, SPAA 2020). Such modelling enables permitting asynchronous execution in multiprocessor systems. A key property of \textit{the predicate} representing such problems is that it induces \textit{one} lattice in the state space. Such representation guarantees the execution to be correct even if nodes execute asynchronously. However, many interesting problems do not exhibit lattice-linearity. This issue was alleviated with the introduction of eventually lattice-linear algorithms (Gupta and Kulkarni, SSS 2021). They induce \textit{single or multiple} lattices in a subset of the state space even when the problem cannot be defined by a predicate under which the states form a lattice. In this paper, we focus on analyzing and differentiating between lattice-linear problems and algorithms. We introduce a new class of algorithms called \textit{fully lattice-linear algorithms}. These algorithms partition the \textit{entire} reachable state space into \textit{one or more lattices}. For illustration, we present lattice-linear self-stabilizing algorithms for minimal dominating set (MDS) and graph colouring (GC) problems, and a parallel processing lattice-linear 2-approximation algorithm for vertex cover (VC). The algorithms for MDS and GC converge in {\boldmath $n$} moves and {\boldmath $n+2m$} moves respectively. These algorithms preserve this time complexity while allowing the nodes to execute asynchronously, where these nodes may execute based on old or inconsistent information about their neighbours. The algorithm for VC is the first lattice-linear approximation algorithm for an NP-Hard problem; it converges in {\boldmath $n$} moves.
If $X$ is a subset of vertices of a graph $G$, then vertices $u$ and $v$ are $X$-visible if there exists a shortest $u,v$-path $P$ such that $V(P)\cap X \subseteq \{u,v\}$. If each two vertices from $X$ are $X$-visible, then $X$ is a mutual-visibility set. The mutual-visibility number of $G$ is the cardinality of a largest mutual-visibility set of $G$ and has been already investigated. In this paper a variety of mutual-visibility problems is introduced based on which natural pairs of vertices are required to be $X$-visible. This yields the total, the dual, and the outer mutual-visibility numbers. We first show that these graph invariants are related to each other and to the classical mutual-visibility number, and then we prove that the three newly introduced mutual-visibility problems are computationally difficult. According to this result, we compute or bound their values for several graphs classes that include for instance grid graphs and tori. We conclude the study by presenting some inter-comparison between the values of such parameters, which is based on the computations we made for some specific families.
We consider parametrized linear-quadratic optimal control problems and provide their online-efficient solutions by combining greedy reduced basis methods and machine learning algorithms. To this end, we first extend the greedy control algorithm, which builds a reduced basis for the manifold of optimal final time adjoint states, to the setting where the objective functional consists of a penalty term measuring the deviation from a desired state and a term describing the control energy. Afterwards, we apply machine learning surrogates to accelerate the online evaluation of the reduced model. The error estimates proven for the greedy procedure are further transferred to the machine learning models and thus allow for efficient a posteriori error certification. We discuss the computational costs of all considered methods in detail and show by means of two numerical examples the tremendous potential of the proposed methodology.
The majority of fault-tolerant distributed algorithms are designed assuming a nominal corruption model, in which at most a fraction $f_n$ of parties can be corrupted by the adversary. However, due to the infamous Sybil attack, nominal models are not sufficient to express the trust assumptions in open (i.e., permissionless) settings. Instead, permissionless systems typically operate in a weighted model, where each participant is associated with a weight and the adversary can corrupt a set of parties holding at most a fraction $f_w$ of total weight. In this paper, we suggest a simple way to transform a large class of protocols designed for the nominal model into the weighted model. To this end, we formalize and solve three novel optimization problems, which we collectively call the weight reduction problems, that allow us to map large real weights into small integer weights while preserving the properties necessary for the correctness of the protocols. In all cases, we manage to keep the sum of the integer weights to be at most linear in the number of parties, resulting in extremely efficient protocols for the weighted model. Moreover, we demonstrate that, on weight distributions that emerge in practice, the sum of the integer weights tends to be far from the theoretical worst-case and, often even smaller than the number of participants. While, for some protocols, our transformation requires an arbitrarily small reduction in resilience (i.e., $f_w = f_n - \epsilon$), surprisingly, for many important problems we manage to obtain weighted solutions with the same resilience ($f_w = f_n$) as nominal ones. Notable examples include asynchronous consensus, verifiable secret sharing, erasure-coded distributed storage and broadcast protocols.
In this paper we give the first efficient algorithms for the $k$-center problem on dynamic graphs undergoing edge updates. In this problem, the goal is to partition the input into $k$ sets by choosing $k$ centers such that the maximum distance from any data point to the closest center is minimized. It is known that it is NP-hard to get a better than $2$ approximation for this problem. While in many applications the input may naturally be modeled as a graph, all prior works on $k$-center problem in dynamic settings are on metrics. In this paper, we give a deterministic decremental $(2+\epsilon)$-approximation algorithm and a randomized incremental $(4+\epsilon)$-approximation algorithm, both with amortized update time $kn^{o(1)}$ for weighted graphs. Moreover, we show a reduction that leads to a fully dynamic $(2+\epsilon)$-approximation algorithm for the $k$-center problem, with worst-case update time that is within a factor $k$ of the state-of-the-art upper bound for maintaining $(1+\epsilon)$-approximate single-source distances in graphs. Matching this bound is a natural goalpost because the approximate distances of each vertex to its center can be used to maintain a $(2+\epsilon)$-approximation of the graph diameter and the fastest known algorithms for such a diameter approximation also rely on maintaining approximate single-source distances.
An NP-complete graph decision problem, the "Multi-stage graph Simple Path" (abbr. MSP) problem, is introduced. The main contribution of this paper is a poly-time algorithm named the ZH algorithm for the problem together with the proof of its correctness, which implies NP=P. (1) A crucial structural property is discovered, whereby all MSP instances are arranged into the sequence $G_{0}$,$G_{1}$,$G_{2}$,... ($G_{k}$ essentially stands for a group of graphs for all $k\geq 0$). For each $G_{j}(j>0)$ in the sequence, there is a graph $G_{i}(0\leq i<j)$ "mathematically homomorphic" to $G_{j}$ which keeps completely accordant with $G_{j}$ on the existence of global solutions. This naturally provides a chance of applying mathematical induction for the proof of an algorithm. In previous attempts, algorithms used for making global decisions were mostly guided by heuristics and intuition. Rather, the ZH algorithm is dedicatedly designed to comply with the proposed proving framework of mathematical induction. (2) Although the ZH algorithm deals with paths, it always regards paths as a collection of edge sets. This is the key to the avoidance of exponential complexity. (3) Any poly-time algorithm that seeks global information can barely avoid the error caused by localized computation. In the ZH algorithm, the proposed reachable-path edge-set $R(e)$ and the computed information for it carry all necessary contextual information, which can be utilized to summarize the "history" and to detect the "future" for searching global solutions. (4) The relation between local strategies and global strategies is discovered and established, wherein preceding decisions can pose constraints to subsequent decisions (and vice versa). This interplay resembles the paradigm of dynamic programming, while much more convoluted. Nevertheless, the computation is always strait forward and decreases monotonically.
Linear inverse problems arise in diverse engineering fields especially in signal and image reconstruction. The development of computational methods for linear inverse problems with sparsity is one of the recent trends in this field. The so-called optimal $k$-thresholding is a newly introduced method for sparse optimization and linear inverse problems. Compared to other sparsity-aware algorithms, the advantage of optimal $k$-thresholding method lies in that it performs thresholding and error metric reduction simultaneously and thus works stably and robustly for solving medium-sized linear inverse problems. However, the runtime of this method is generally high when the size of the problem is large. The purpose of this paper is to propose an acceleration strategy for this method. Specifically, we propose a heavy-ball-based optimal $k$-thresholding (HBOT) algorithm and its relaxed variants for sparse linear inverse problems. The convergence of these algorithms is shown under the restricted isometry property. In addition, the numerical performance of the heavy-ball-based relaxed optimal $k$-thresholding pursuit (HBROTP) has been evaluated, and simulations indicate that HBROTP admits robustness for signal and image reconstruction even in noisy environments.
Data profiling is an essential process in modern data-driven industries. One of its critical components is the discovery and validation of complex statistics, including functional dependencies, data constraints, association rules, and others. However, most existing data profiling systems that focus on complex statistics do not provide proper integration with the tools used by contemporary data scientists. This creates a significant barrier to the adoption of these tools in the industry. Moreover, existing systems were not created with industrial-grade workloads in mind. Finally, they do not aim to provide descriptive explanations, i.e. why a given pattern is not found. It is a significant issue as it is essential to understand the underlying reasons for a specific pattern's absence to make informed decisions based on the data. Because of that, these patterns are effectively rest in thin air: their application scope is rather limited, they are rarely used by the broader public. At the same time, as we are going to demonstrate in this presentation, complex statistics can be efficiently used to solve many classic data quality problems. Desbordante is an open-source data profiler that aims to close this gap. It is built with emphasis on industrial application: it is efficient, scalable, resilient to crashes, and provides explanations. Furthermore, it provides seamless Python integration by offloading various costly operations to the C++ core, not only mining. In this demonstration, we show several scenarios that allow end users to solve different data quality problems. Namely, we showcase typo detection, data deduplication, and data anomaly detection scenarios.
Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.