Attention is the crucial cognitive ability that limits and selects what information we observe. Previous work by Bolander et al. (2016) proposes a model of attention based on dynamic epistemic logic (DEL) where agents are either fully attentive or not attentive at all. While introducing the realistic feature that inattentive agents believe nothing happens, the model does not represent the most essential aspect of attention: its selectivity. Here, we propose a generalization that allows for paying attention to subsets of atomic formulas. We introduce the corresponding logic for propositional attention, and show its axiomatization to be sound and complete. We then extend the framework to account for inattentive agents that, instead of assuming nothing happens, may default to a specific truth-value of what they failed to attend to (a sort of prior concerning the unattended atoms). This feature allows for a more cognitively plausible representation of the inattentional blindness phenomenon, where agents end up with false beliefs due to their failure to attend to conspicuous but unexpected events. Both versions of the model define attention-based learning through appropriate DEL event models based on a few and clear edge principles. While the size of such event models grow exponentially both with the number of agents and the number of atoms, we introduce a new logical language for describing event models syntactically and show that using this language our event models can be represented linearly in the number of agents and atoms. Furthermore, representing our event models using this language is achieved by a straightforward formalisation of the aforementioned edge principles.
In this paper we study a Separation Logic of Relations (SLR) and compare its expressiveness to (Monadic)Second Order Logic (M)SO. SLR is based on the well-known Symbolic Heap fragment of Separation Logic, whose formulae are composed of points-to assertions, inductively defined predicates, with the separating conjunction as the only logical connective. SLR generalizes the Symbolic Heap fragment by supporting general relational atoms, instead of only points-to assertions. In this paper, we restrict ourselves to finite relational structures, and hence only consider Weak (M)SO, where quantification ranges over finite sets. Our main results are that SLR and MSO are incomparable on structures of unbounded treewidth, while SLR can be embedded in SO in general. Furthermore, MSO becomes a strict subset of SLR, when the treewidth of the models is bounded by a parameter and all vertices attached to some hyperedge belong to the interpretation of a fixed unary relation symbol. We also discuss the problem of identifying a fragment of SLR that is equivalent to MSO over models of bounded treewidth.
Separation logic's compositionality and local reasoning properties have led to significant advances in scalable static analysis. But program analysis has new challenges--many programs display computational effects (e.g. randomization) and, orthogonally, static analysers must handle incorrectness too. We present Outcome Separation Logic (OSL), a program logic that is sound for both correctness and incorrectness reasoning with varying effects. OSL has a frame rule--just like separation logic--but uses different underlying assumptions that lift restrictions imposed by SL, which precluded reasoning about incorrectness and effects. Building on this foundational theory, we also define symbolic execution algorithms that use bi-abduction to derive specifications for programs with effects. This involves a new tri-abduction procedure to analyze programs whose execution branches due to effects such as nondeterministic or probabilistic choice. This work furthers the compositionality promised by separation logic by opening up the possibility for greater reuse of analysis tools across two dimensions: bug-finding vs verification in programs with varying effects.
Various neural network architectures rely on pooling operators to aggregate information coming from different sources. It is often implicitly assumed in such contexts that vectors encode epistemic states, i.e. that vectors capture the evidence that has been obtained about some properties of interest, and that pooling these vectors yields a vector that combines this evidence. We study, for a number of standard pooling operators, under what conditions they are compatible with this idea, which we call the epistemic pooling principle. While we find that all the considered pooling operators can satisfy the epistemic pooling principle, this only holds when embeddings are sufficiently high-dimensional and, for most pooling operators, when the embeddings satisfy particular constraints (e.g. having non-negative coordinates). We furthermore show that these constraints have important implications on how the embeddings can be used in practice. In particular, we find that when the epistemic pooling principle is satisfied, in most cases it is impossible to verify the satisfaction of propositional formulas using linear scoring functions, with two exceptions: (i) max-pooling with embeddings that are upper-bounded and (ii) Hadamard pooling with non-negative embeddings. This finding helps to clarify, among others, why Graph Neural Networks sometimes under-perform in reasoning tasks. Finally, we also study an extension of the epistemic pooling principle to weighted epistemic states, which are important in the context of non-monotonic reasoning, where max-pooling emerges as the most suitable operator.
The volume function V(t) of a compact set S\in R^d is just the Lebesgue measure of the set of points within a distance to S not larger than t. According to some classical results in geometric measure theory, the volume function turns out to be a polynomial, at least in a finite interval, under a quite intuitive, easy to interpret, sufficient condition (called ``positive reach'') which can be seen as an extension of the notion of convexity. However, many other simple sets, not fulfilling the positive reach condition, have also a polynomial volume function. To our knowledge, there is no general, simple geometric description of such sets. Still, the polynomial character of $V(t)$ has some relevant consequences since the polynomial coefficients carry some useful geometric information. In particular, the constant term is the volume of S and the first order coefficient is the boundary measure (in Minkowski's sense). This paper is focused on sets whose volume function is polynomial on some interval starting at zero, whose length (that we call ``polynomial reach'') might be unknown. Our main goal is to approximate such polynomial reach by statistical means, using only a large enough random sample of points inside S. The practical motivation is simple: when the value of the polynomial reach , or rather a lower bound for it, is approximately known, the polynomial coefficients can be estimated from the sample points by using standard methods in polynomial approximation. As a result, we get a quite general method to estimate the volume and boundary measure of the set, relying only on an inner sample of points and not requiring the use any smoothing parameter. This paper explores the theoretical and practical aspects of this idea.
Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying $\textit{functional}$ mechanism shifts in two or more related SCMs over the same set of variables -- $\textit{without estimating the entire DAG structure of each SCM}$. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the $\textit{mixture distribution}$ allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.
Graph Neural Networks (GNNs) have been successfully used in many problems involving graph-structured data, achieving state-of-the-art performance. GNNs typically employ a message-passing scheme, in which every node aggregates information from its neighbors using a permutation-invariant aggregation function. Standard well-examined choices such as the mean or sum aggregation functions have limited capabilities, as they are not able to capture interactions among neighbors. In this work, we formalize these interactions using an information-theoretic framework that notably includes synergistic information. Driven by this definition, we introduce the Graph Ordering Attention (GOAT) layer, a novel GNN component that captures interactions between nodes in a neighborhood. This is achieved by learning local node orderings via an attention mechanism and processing the ordered representations using a recurrent neural network aggregator. This design allows us to make use of a permutation-sensitive aggregator while maintaining the permutation-equivariance of the proposed GOAT layer. The GOAT model demonstrates its increased performance in modeling graph metrics that capture complex information, such as the betweenness centrality and the effective size of a node. In practical use-cases, its superior modeling capability is confirmed through its success in several real-world node classification benchmarks.
Artificial Intelligence (AI) is rapidly becoming integrated into military Command and Control (C2) systems as a strategic priority for many defence forces. The successful implementation of AI is promising to herald a significant leap in C2 agility through automation. However, realistic expectations need to be set on what AI can achieve in the foreseeable future. This paper will argue that AI could lead to a fragility trap, whereby the delegation of C2 functions to an AI could increase the fragility of C2, resulting in catastrophic strategic failures. This calls for a new framework for AI in C2 to avoid this trap. We will argue that antifragility along with agility should form the core design principles for AI-enabled C2 systems. This duality is termed Agile, Antifragile, AI-Enabled Command and Control (A3IC2). An A3IC2 system continuously improves its capacity to perform in the face of shocks and surprises through overcompensation from feedback during the C2 decision-making cycle. An A3IC2 system will not only be able to survive within a complex operational environment, it will also thrive, benefiting from the inevitable shocks and volatility of war.
In humans, Attention is a core property of all perceptual and cognitive operations. Given our limited ability to process competing sources, attention mechanisms select, modulate, and focus on the information most relevant to behavior. For decades, concepts and functions of attention have been studied in philosophy, psychology, neuroscience, and computing. For the last six years, this property has been widely explored in deep neural networks. Currently, the state-of-the-art in Deep Learning is represented by neural attention models in several application domains. This survey provides a comprehensive overview and analysis of developments in neural attention models. We systematically reviewed hundreds of architectures in the area, identifying and discussing those in which attention has shown a significant impact. We also developed and made public an automated methodology to facilitate the development of reviews in the area. By critically analyzing 650 works, we describe the primary uses of attention in convolutional, recurrent networks and generative models, identifying common subgroups of uses and applications. Furthermore, we describe the impact of attention in different application domains and their impact on neural networks' interpretability. Finally, we list possible trends and opportunities for further research, hoping that this review will provide a succinct overview of the main attentional models in the area and guide researchers in developing future approaches that will drive further improvements.
Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review salient neural architectures in which attention has been incorporated, and discuss applications in which modeling attention has shown a significant impact. Finally, we also describe how attention has been used to improve the interpretability of neural networks. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.