Rule-based reasoning is an essential part of human intelligence prominently formalized in artificial intelligence research via logic programs. Describing complex objects as the composition of elementary ones is a common strategy in computer science and science in general. The author has recently introduced the sequential composition of logic programs in the context of logic-based analogical reasoning and learning in logic programming. Motivated by these applications, in this paper we construct a qualitative and algebraic notion of syntactic logic program similarity from sequential decompositions of programs. We then show how similarity can be used to answer queries across different domains via a one-step reduction. In a broader sense, this paper is a further step towards an algebraic theory of logic programming.
The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample. Such an approach has been used pervasively in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numerically sparse. To overcome the limitation, we develop a novel element-wise subset selection approach, called core-elements, for large-scale least squares estimation in classical linear regression. We provide a deterministic algorithm to construct the core-elements estimator, only requiring an $O(\mbox{nnz}(\mathbf{X})+rp^2)$ computational cost, where $\mathbf{X}$ is an $n\times p$ predictor matrix, $r$ is the number of elements selected from each column of $\mathbf{X}$, and $\mbox{nnz}(\cdot)$ denotes the number of non-zero elements. Theoretically, we show that the proposed estimator is unbiased and approximately minimizes an upper bound of the estimation variance. We also provide an approximation guarantee by deriving a coresets-like finite sample bound for the proposed estimator. To handle potential outliers in the data, we further combine core-elements with the median-of-means procedure, resulting in an efficient and robust estimator with theoretical consistency guarantees. Numerical studies on various synthetic and open-source datasets demonstrate the proposed method's superior performance compared to mainstream competitors.
Most multi-objective optimisation algorithms maintain an archive explicitly or implicitly during their search. Such an archive can be solely used to store high-quality solutions presented to the decision maker, but in many cases may participate in the search process (e.g., as the population in evolutionary computation). Over the last two decades, archiving, the process of comparing new solutions with previous ones and deciding how to update the archive/population, stands as an important issue in evolutionary multi-objective optimisation (EMO). This is evidenced by constant efforts from the community on developing various effective archiving methods, ranging from conventional Pareto-based methods to more recent indicator-based and decomposition-based ones. However, the focus of these efforts is on empirical performance comparison in terms of specific quality indicators; there is lack of systematic study of archiving methods from a general theoretical perspective. In this paper, we attempt to conduct a systematic overview of multi-objective archiving, in the hope of paving the way to understand archiving algorithms from a holistic perspective of theory and practice, and more importantly providing a guidance on how to design theoretically desirable and practically useful archiving algorithms. In doing so, we also present that archiving algorithms based on weakly Pareto compliant indicators (e.g., epsilon-indicator), as long as designed properly, can achieve the same theoretical desirables as archivers based on Pareto compliant indicators (e.g., hypervolume indicator). Such desirables include the property limit-optimal, the limit form of the possible optimal property that a bounded archiving algorithm can have with respect to the most general form of superiority between solution sets.
Transition systems are often used to describe the behaviour of software systems. If viewed as a graph then, at their most basic level, vertices correspond to the states of a program and each edge represents a transition between states via the (atomic) action labelled. In this setting, systems are thought to be consistent so that at each state formulas are evaluated as either True or False. On the other hand, when a structure of this sort - for example a map where states represent locations, some local properties are known and labelled transitions represent information available about different routes - is built resorting to multiple sources of information, it is common to find inconsistent or incomplete information regarding what holds at each state, both at the level of propositional variables and transitions. This paper aims at bringing together Belnap's four values, Dynamic Logic and hybrid machinery such as nominals and the satisfaction operator, so that reasoning is still possible in face of contradicting evidence. Proof-theory for this new logic is explored by means of a terminating, sound and complete tableaux system.
The transistor celebrated its 75${}^\text{th}$ birthday in 2022. The continued scaling of the transistor defined by Moore's Law continues, albeit at a slower pace. Meanwhile, computing demands and energy consumption required by modern artificial intelligence (AI) algorithms have skyrocketed. As an alternative to scaling transistors for general-purpose computing, the integration of transistors with unconventional technologies has emerged as a promising path for domain-specific computing. In this article, we provide a full-stack review of probabilistic computing with p-bits as a representative example of the energy-efficient and domain-specific computing movement. We argue that p-bits could be used to build energy-efficient probabilistic systems, tailored for probabilistic algorithms and applications. From hardware, architecture, and algorithmic perspectives, we outline the main applications of probabilistic computers ranging from probabilistic machine learning and AI to combinatorial optimization and quantum simulation. Combining emerging nanodevices with the existing CMOS ecosystem will lead to probabilistic computers with orders of magnitude improvements in energy efficiency and probabilistic sampling, potentially unlocking previously unexplored regimes for powerful probabilistic algorithms.
This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the existing results can handle the problem in the case with a general objective function subject to noisy data only when the RIP constant is close to 0. In this paper, we develop a new mathematical framework to solve the above-mentioned problem with a far less restrictive RIP constant. We prove that as long as the RIP constant of the noiseless objective is less than $1/3$, any spurious local solution of the noisy optimization problem must be close to the ground truth solution. By working through the strict saddle property, we also show that an approximate solution can be found in polynomial time. We characterize the geometry of the spurious local minima of the problem in a local region around the ground truth in the case when the RIP constant is greater than $1/3$. Compared to the existing results in the literature, this paper offers the strongest RIP bound and provides a complete theoretical analysis on the global and local optimization landscapes of general low-rank optimization problems under random corruptions from any finite-variance family.
A core part of the rehabilitation scheduling process consists of planning rehabilitation physiotherapy sessions for patients, by assigning proper operators to them in a certain time slot of a given day, taking into account several legal, medical and ethical requirements and optimizations, e.g., patient's preferences and operator's work balancing. Being able to efficiently solve such problem is of upmost importance, in particular after the COVID-19 pandemic that significantly increased rehabilitation's needs. In this paper, we present a two-phase solution to rehabilitation scheduling based on Answer Set Programming, which proved to be an effective tool for solving practical scheduling problems. We first present a general encoding, and then add domain specific optimizations. Results of experiments performed on both synthetic and real benchmarks, the latter provided by ICS Maugeri, show the effectiveness of our solution as well as the impact of our domain specific optimizations. Under consideration in Theory and Practice of Logic Programming (TPLP).
This paper explores continuous-time control synthesis for target-driven navigation to satisfy complex high-level tasks expressed as linear temporal logic (LTL). We propose a model-free framework using deep reinforcement learning (DRL) where the underlying dynamic system is unknown (an opaque box). Unlike prior work, this paper considers scenarios where the given LTL specification might be infeasible and therefore cannot be accomplished globally. Instead of modifying the given LTL formula, we provide a general DRL-based approach to satisfy it with minimal violation. To do this, we transform a previously multi-objective DRL problem, which requires simultaneous automata satisfaction and minimum violation cost, into a single objective. By guiding the DRL agent with a sampling-based path planning algorithm for the potentially infeasible LTL task, the proposed approach mitigates the myopic tendencies of DRL, which are often an issue when learning general LTL tasks that can have long or infinite horizons. This is achieved by decomposing an infeasible LTL formula into several reach-avoid sub-tasks with shorter horizons, which can be trained in a modular DRL architecture. Furthermore, we overcome the challenge of the exploration process for DRL in complex and cluttered environments by using path planners to design rewards that are dense in the configuration space. The benefits of the presented approach are demonstrated through testing on various complex nonlinear systems and compared with state-of-the-art baselines. The Video demonstration can be found here://youtu.be/jBhx6Nv224E.
Link prediction on knowledge graphs (KGs) is a key research topic. Previous work mainly focused on binary relations, paying less attention to higher-arity relations although they are ubiquitous in real-world KGs. This paper considers link prediction upon n-ary relational facts and proposes a graph-based approach to this task. The key to our approach is to represent the n-ary structure of a fact as a small heterogeneous graph, and model this graph with edge-biased fully-connected attention. The fully-connected attention captures universal inter-vertex interactions, while with edge-aware attentive biases to particularly encode the graph structure and its heterogeneity. In this fashion, our approach fully models global and local dependencies in each n-ary fact, and hence can more effectively capture associations therein. Extensive evaluation verifies the effectiveness and superiority of our approach. It performs substantially and consistently better than current state-of-the-art across a variety of n-ary relational benchmarks. Our code is publicly available.
Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.