A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.
We revisit parallel-innermost term rewriting as a model of parallel computation on inductive data structures and provide a corresponding notion of runtime complexity parametric in the size of the start term. We propose automatic techniques to derive both upper and lower bounds on parallel complexity of rewriting that enable a direct reuse of existing techniques for sequential complexity. Our approach to find lower bounds requires confluence of the parallel-innermost rewrite relation, thus we also provide effective sufficient criteria for proving confluence. The applicability and the precision of the method are demonstrated by the relatively light effort in extending the program analysis tool AProVE and by experiments on numerous benchmarks from the literature.
This paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller - that solely decides based on the observations seen so far - to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesising good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a belief-based and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory footprint.
We propose a unifying framework for smoothed analysis of combinatorial local optimization problems and show how a diverse selection of problems within the complexity class PLS can be cast within this model. This abstraction allows us to identify key structural properties, and corresponding parameters, that determine the smoothed running time of local search dynamics. We formalize this via a black-box tool that provides concrete bounds on the expected maximum number of steps needed until local search reaches an exact local optimum. This bound is particularly strong, in the sense that it holds for any starting feasible solution, any choice of pivoting rule, and does not rely on the choice of specific noise distributions that are applied on the input, but it is parameterized by just a global upper bound $\phi$ on the probability density. We then demonstrate the power of this tool by instantiating it for various PLS-hard problems to derive efficient smoothed running times. This not only unifies, and greatly simplifies, prior existing positive results, but also allows us to extend or improve them. Notable problems on which we provide such a contribution are Max-Cut, the Travelling Salesman problem, and Network Coordination Games. Additionally, in this paper we propose novel smoothed analysis formulations, and prove polynomial smoothed running times, for important local optimization problems that have not been studied before from this perspective. Importantly, we provide an extensive study of the problem of finding pure Nash equilibria in general and Network Congestion Games under various representation models, including explicit, step-function, and polynomial latencies. We show that all the problems we study can be solved by their standard local search algorithms in polynomial smoothed time on PLS-hard instances in which these algorithms have exponential worst-case running time.
We consider a supervised learning setup in which the goal is to predicts an outcome from a sample of irregularly sampled time series using Neural Controlled Differential Equations (Kidger, Morrill, et al. 2020). In our framework, the time series is a discretization of an unobserved continuous path, and the outcome depends on this path through a controlled differential equation with unknown vector field. Learning with discrete data thus induces a discretization bias, which we precisely quantify. Using theoretical results on the continuity of the flow of controlled differential equations, we show that the approximation bias is directly related to the approximation error of a Lipschitz function defining the generative model by a shallow neural network. By combining these result with recent work linking the Lipschitz constant of neural networks to their generalization capacities, we upper bound the generalization gap between the expected loss attained by the empirical risk minimizer and the expected loss of the true predictor.
Prefix aggregation operation (also called scan), and its particular case, prefix summation, is an important parallel primitive and enjoys a lot of attention in the research literature. It is also used in many algorithms as one of the steps. Aggregation over dominated points in $\mathbb{R}^m$ is a multidimensional generalisation of prefix aggregation. It is also intensively researched, both as a parallel primitive and as a practical problem, encountered in computational geometry, spatial databases and data warehouses. In this paper we show that, for a constant dimension $m$, aggregation over dominated points in $\mathbb{R}^m$ can be computed by $O(1)$ basic operations that include sorting the whole dataset, zipping sorted lists of elements, computing prefix aggregations of lists of elements and flat maps, which expand the data size from initial $n$ to $n\log^{m-1}n$. Thereby we establish that prefix aggregation suffices to express aggregation over dominated points in more dimensions, even though the latter is a far-reaching generalisation of the former. Many problems known to be expressible by aggregation over dominated points become expressible by prefix aggregation, too. We rely on a small set of primitive operations which guarantee an easy transfer to various distributed architectures and some desired properties of the implementation.
The paper revisits the robust $s$-$t$ path problem, one of the most fundamental problems in robust optimization. In the problem, we are given a directed graph with $n$ vertices and $k$ distinct cost functions (scenarios) defined over edges, and aim to choose an $s$-$t$ path such that the total cost of the path is always provable no matter which scenario is realized. With the view of each cost function being associated with an agent, our goal is to find a common $s$-$t$ path minimizing the maximum objective among all agents, and thus create a fair solution for them. The problem is hard to approximate within $o(\log k)$ by any quasi-polynomial time algorithm unless $\mathrm{NP} \subseteq \mathrm{DTIME}(n^{\mathrm{poly}\log n})$, and the best approximation ratio known to date is $\widetilde{O}(\sqrt{n})$ which is based on the natural flow linear program. A longstanding open question is whether we can achieve a polylogarithmic approximation even when a quasi-polynomial running time is allowed. We give the first polylogarithmic approximation for robust $s$-$t$ path since the problem was proposed more than two decades ago. In particular, we introduce a $O(\log n \log k)$-approximate algorithm running in quasi-polynomial time. The algorithm is built on a novel linear program formulation for a decision-tree-type structure which enables us to get rid of the $\Omega(\max\{k,\sqrt{n}\})$ integrality gap of the natural flow LP. Further, we also consider some well-known graph classes, e.g., graphs with bounded treewidth, and show that the polylogarithmic approximation can be achieved polynomially on these graphs. We hope the new proposed techniques in the paper can offer new insights into the robust $s$-$t$ path problem and related problems in robust optimization.
Multi-Agent Path Finding (MAPF) is a fundamental motion coordination problem arising in multi-agent systems with a wide range of applications. The problem's intractability has led to extensive research on improving the scalability of solvers for it. Since optimal solvers can struggle to scale, a major challenge that arises is understanding what makes MAPF hard. We tackle this challenge through a fine-grained complexity analysis of time-optimal MAPF on 2D grids, thereby closing two gaps and identifying a new tractability frontier. First, we show that 2-colored MAPF, i.e., where the agents are divided into two teams, each with its own set of targets, remains NP-hard. Second, for the flowtime objective (also called sum-of-costs), we show that it remains NP-hard to find a solution in which agents have an individually optimal cost, which we call an individually optimal solution. The previously tightest results for these MAPF variants are for (non-grid) planar graphs. We use a single hardness construction that replaces, strengthens, and unifies previous proofs. We believe that it is also simpler than previous proofs for the planar case as it employs minimal gadgets that enable its full visualization in one figure. Finally, for the flowtime objective, we establish a tractability frontier based on the number of directions agents can move in. Namely, we complement our hardness result, which holds for three directions, with an efficient algorithm for finding an individually optimal solution if only two directions are allowed. This result sheds new light on the structure of optimal solutions, which may help guide algorithm design for the general problem.
The Functional Machine Calculus (FMC) was recently introduced as a generalization of the lambda-calculus to include higher-order global state, probabilistic and non-deterministic choice, and input and output, while retaining confluence. The calculus can encode both the call-by-name and call-by-value semantics of these effects. This is enabled by two independent generalisations, both natural from the perspective of the FMC's operational semantics, which is given by a simple multi-stack machine. The first generalization decomposes the syntax of the lambda-calculus in a way that allows for sequential composition of terms and the encoding of reduction strategies. Specifically, there exist translations of the call-by-name and call-by-value lambda-calculus which preserve operational semantics. The second parameterizes application and abstraction in terms of 'locations' (corresponding to the multiple stacks of the machine), which gives a unification of the operational semantics, syntax, and reduction rules of the given effects with those of the lambda-calculus. The FMC further comes equipped with a simple type system which restricts and captures the behaviour of effects. This thesis makes two main contributions, showing that two fundamental properties of the lambda-calculus are preserved by the FMC. The first is to show that the categorical semantics of the FMC, modulo an appropriate equational theory, is given by the free Cartesian closed category. The equational theory is validated by a notion of observational equivalence. The second contribution is a proof that typed FMC-terms are strongly normalising. This is an extension (and small simplification) of Gandy's proof for the lambda-calculus, which additionally emphasizes its latent operational intuition.
Reasoning over knowledge graphs (KGs) is a challenging task that requires a deep understanding of the complex relationships between entities and the underlying logic of their relations. Current approaches rely on learning geometries to embed entities in vector space for logical query operations, but they suffer from subpar performance on complex queries and dataset-specific representations. In this paper, we propose a novel decoupled approach, Language-guided Abstract Reasoning over Knowledge graphs (LARK), that formulates complex KG reasoning as a combination of contextual KG search and logical query reasoning, to leverage the strengths of graph extraction algorithms and large language models (LLM), respectively. Our experiments demonstrate that the proposed approach outperforms state-of-the-art KG reasoning methods on standard benchmark datasets across several logical query constructs, with significant performance gain for queries of higher complexity. Furthermore, we show that the performance of our approach improves proportionally to the increase in size of the underlying LLM, enabling the integration of the latest advancements in LLMs for logical reasoning over KGs. Our work presents a new direction for addressing the challenges of complex KG reasoning and paves the way for future research in this area.
Many practical problems can be understood as the search for a state of affairs that extends a fixed partial state of affairs, the \emph{environment}, while satisfying certain conditions that are formally specified. Such problems are found in, e.g., engineering, law or economics. We study this class of problems in a context where some of the relevant information about the environment is not known by the user at the start of the search. During the search, the user may consider tentative solutions that make implicit hypotheses about these unknowns. To ensure that the solution is appropriate, these hypotheses must be verified by observing the environment. Furthermore, we assume that, in addition to knowledge of what constitutes a solution, knowledge of general laws of the environment is also present. We formally define partial solutions with enough verified facts to guarantee the existence of complete and appropriate solutions. Additionally, we propose an interactive system to assist the user in their search by determining 1) which hypotheses implicit in a tentative solution must be verified in the environment, and 2) which observations can bring useful information for the search. We present an efficient method to over-approximate the set of relevant information, and evaluate our implementation.