Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.
We present a parallel algorithm for the fast Fourier transform (FFT) in higher dimensions. This algorithm generalizes the cyclic-to-cyclic one-dimensional parallel algorithm to a cyclic-to-cyclic multidimensional parallel algorithm while retaining the property of needing only a single all-to-all communication step. This is under the constraint that we use at most $\sqrt{N}$ processors for an FFT on an array with a total of $N$ elements, irrespective of the dimension $d$ or the shape of the array. The only assumption we make is that $N$ is sufficiently composite. Our algorithm starts and ends in the same data distribution. We present our multidimensional implementation FFTU which utilizes the sequential FFTW program for its local FFTs, and which can handle any dimension $d$. We obtain experimental results for $d\leq 5$ using MPI on up to 4096 cores of the supercomputer Snellius, comparing FFTU with the parallel FFTW program and with PFFT and heFFTe. These results show that FFTU is competitive with the state of the art and that it allows one to use a larger number of processors, while keeping communication limited to a single all-to-all operation. For arrays of size $1024^3$ and $64^5$, FFTU achieves a speedup of a factor 149 and 176, respectively, on 4096 processors.
We propose a new method for estimating causal effects in longitudinal/panel data settings that we call generalized difference-in-differences. Our approach unifies two alternative approaches in these settings: ignorability estimators (e.g., synthetic controls) and difference-in-differences (DiD) estimators. We propose a new identifying assumption -- a stable bias assumption -- which generalizes the conditional parallel trends assumption in DiD, leading to the proposed generalized DiD framework. This change gives generalized DiD estimators the flexibility of ignorability estimators while maintaining the robustness to unobserved confounding of DiD. We also show how ignorability and DiD estimators are special cases of generalized DiD. We then propose influence-function based estimators of the observed data functional, allowing the use of double/debiased machine learning for estimation. We also show how generalized DiD easily extends to include clustered treatment assignment and staggered adoption settings, and we discuss how the framework can facilitate estimation of other treatment effects beyond the average treatment effect on the treated. Finally, we provide simulations which show that generalized DiD outperforms ignorability and DiD estimators when their identifying assumptions are not met, while being competitive with these special cases when their identifying assumptions are met.
The one-to-one mapping of control inputs to actuator outputs results in elaborate routing architectures that limit how complex fluidic soft robot behaviours can currently become. Embodied intelligence can be used as a tool to counteract this phenomenon. Control functionality can be embedded directly into actuators by leveraging the characteristics of fluid flow phenomena. Whilst prior soft robotics work has focused exclusively on actuators operating in a state of transient/no flow (constant pressure), or pulsatile/alternating flow, our work begins to explore the possibilities granted by operating in the closed-loop flow recirculation regime. Here we introduce the concept of FlowBots: soft robots that utilise the characteristics of continuous fluid flow to enable the embodiment of complex control functionality directly into the structure of the robot. FlowBots have robust, integrated, no-moving-part control systems, and these architectures enable: monolithic additive manufacturing methods, rapid prototyping, greater sustainability, and an expansive range of applications. Based on three FlowBot examples: a bidirectional actuator, a gripper, and a quadruped swimmer - we demonstrate how the characteristics of flow recirculation contribute to simplifications in fluidic analogue control architectures. We conclude by outlining our design and rapid prototyping methodology to empower others in the field to explore this new, emerging design field, and design their own FlowBots.
The integration of data and knowledge from several sources is known as data fusion. When data is only available in a distributed fashion or when different sensors are used to infer a quantity of interest, data fusion becomes essential. In Bayesian settings, a priori information of the unknown quantities is available and, possibly, present among the different distributed estimators. When the local estimates are fused, the prior knowledge used to construct several local posteriors might be overused unless the fusion node accounts for this and corrects it. In this paper, we analyze the effects of shared priors in Bayesian data fusion contexts. Depending on different common fusion rules, our analysis helps to understand the performance behavior as a function of the number of collaborative agents and as a consequence of different types of priors. The analysis is performed by using two divergences which are common in Bayesian inference, and the generality of the results allows to analyze very generic distributions. These theoretical results are corroborated through experiments in a variety of estimation and classification problems, including linear and nonlinear models, and federated learning schemes.
We present an approach for analyzing message passing graph neural networks (MPNNs) based on an extension of graphon analysis to a so called graphon-signal analysis. A MPNN is a function that takes a graph and a signal on the graph (a graph-signal) and returns some value. Since the input space of MPNNs is non-Euclidean, i.e., graphs can be of any size and topology, properties such as generalization are less well understood for MPNNs than for Euclidean neural networks. We claim that one important missing ingredient in past work is a meaningful notion of graph-signal similarity measure, that endows the space of inputs to MPNNs with a regular structure. We present such a similarity measure, called the graphon-signal cut distance, which makes the space of all graph-signals a dense subset of a compact metric space -- the graphon-signal space. Informally, two deterministic graph-signals are close in cut distance if they ``look like'' they were sampled from the same random graph-signal model. Hence, our cut distance is a natural notion of graph-signal similarity, which allows comparing any pair of graph-signals of any size and topology. We prove that MPNNs are Lipschitz continuous functions over the graphon-signal metric space. We then give two applications of this result: 1) a generalization bound for MPNNs, and, 2) the stability of MPNNs to subsampling of graph-signals. Our results apply to any regular enough MPNN on any distribution of graph-signals, making the analysis rather universal.
This paper considers the problem of robust iterative Bayesian smoothing in nonlinear state-space models with additive noise using Gaussian approximations. Iterative methods are known to improve smoothed estimates but are not guaranteed to converge, motivating the development of more robust versions of the algorithms. The aim of this article is to present Levenberg-Marquardt (LM) and line-search extensions of the classical iterated extended Kalman smoother (IEKS) as well as the iterated posterior linearisation smoother (IPLS). The IEKS has previously been shown to be equivalent to the Gauss-Newton (GN) method. We derive a similar GN interpretation for the IPLS. Furthermore, we show that an LM extension for both iterative methods can be achieved with a simple modification of the smoothing iterations, enabling algorithms with efficient implementations. Our numerical experiments show the importance of robust methods, in particular for the IEKS-based smoothers. The computationally expensive IPLS-based smoothers are naturally robust but can still benefit from further regularisation.
Entity alignment seeks identical entities in different knowledge graphs, which is a long-standing task in the database research. Recent work leverages deep learning to embed entities in vector space and align them via nearest neighbor search. Although embedding-based entity alignment has gained marked success in recent years, it lacks explanations for alignment decisions. In this paper, we present the first framework that can generate explanations for understanding and repairing embedding-based entity alignment results. Given an entity alignment pair produced by an embedding model, we first compare its neighbor entities and relations to build a matching subgraph as a local explanation. We then construct an alignment dependency graph to understand the pair from an abstract perspective. Finally, we repair the pair by resolving three types of alignment conflicts based on dependency graphs. Experiments on five datasets demonstrate the effectiveness and generalization of our framework in explaining and repairing embedding-based entity alignment results.
We present a novel clustering algorithm, visClust, that is based on lower dimensional data representations and visual interpretation. Thereto, we design a transformation that allows the data to be represented by a binary integer array enabling the use of image processing methods to select a partition. Qualitative and quantitative analyses measured in accuracy and an adjusted Rand-Index show that the algorithm performs well while requiring low runtime and RAM. We compare the results to 6 state-of-the-art algorithms with available code, confirming the quality of visClust by superior performance in most experiments. Moreover, the algorithm asks for just one obligatory input parameter while allowing optimization via optional parameters. The code is made available on GitHub and straightforward to use.
A countable structure is indivisible if for every coloring with finite range there is a monochromatic isomorphic subcopy of the structure. Each indivisible structure $\mathcal{S}$ naturally corresponds to an indivisibility problem $\mathsf{Ind}\ \mathcal{S}$, which outputs such a subcopy given a presentation and coloring. We investigate the Weihrauch complexity of the indivisibility problems for two structures: the rational numbers $\mathbb{Q}$ as a linear order, and the equivalence relation $\mathscr{E}$ with countably many equivalence classes each having countably many members. We separate the Weihrauch degrees of both $\mathsf{Ind}\ \mathbb{Q}$ and $\mathsf{Ind}\ \mathscr{E}$ from several benchmark problems, showing in particular that $\mathsf{C}_\mathbb{N} \vert_\mathrm{W} \mathsf{Ind}\ \mathbb{Q}$ and hence $\mathsf{Ind}\ \mathbb{Q}$ is strictly weaker than the problem of finding an interval in which some color is dense for a given coloring of $\mathbb{Q}$; and that the Weihrauch degree of $\mathsf{Ind}\ \mathscr{E}_k$ is strictly between those of $\mathsf{SRT}^2_k$ and $\mathsf{RT}^2_k$, where $\mathsf{Ind}\ \mathcal{S}_k$ is the restriction of $\mathsf{Ind}\ \mathcal{S}$ to $k$-colorings.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.