We present an approach for analyzing message passing graph neural networks (MPNNs) based on an extension of graphon analysis to a so called graphon-signal analysis. A MPNN is a function that takes a graph and a signal on the graph (a graph-signal) and returns some value. Since the input space of MPNNs is non-Euclidean, i.e., graphs can be of any size and topology, properties such as generalization are less well understood for MPNNs than for Euclidean neural networks. We claim that one important missing ingredient in past work is a meaningful notion of graph-signal similarity measure, that endows the space of inputs to MPNNs with a regular structure. We present such a similarity measure, called the graphon-signal cut distance, which makes the space of all graph-signals a dense subset of a compact metric space -- the graphon-signal space. Informally, two deterministic graph-signals are close in cut distance if they ``look like'' they were sampled from the same random graph-signal model. Hence, our cut distance is a natural notion of graph-signal similarity, which allows comparing any pair of graph-signals of any size and topology. We prove that MPNNs are Lipschitz continuous functions over the graphon-signal metric space. We then give two applications of this result: 1) a generalization bound for MPNNs, and, 2) the stability of MPNNs to subsampling of graph-signals. Our results apply to any regular enough MPNN on any distribution of graph-signals, making the analysis rather universal.
This paper leverages macroscopic models and multi-source spatiotemporal data collected from automatic traffic counters and probe vehicles to accurately estimate traffic flow and travel time in links where these measurements are unavailable. This problem is critical in transportation planning applications where the sensor coverage is low and the planned interventions have network-wide impacts. The proposed model, named the Macroscopic Traffic Estimator (MaTE), can perform network-wide estimations of traffic flow and travel time only using the set of observed measurements of these quantities. Because MaTE is grounded in macroscopic flow theory, all parameters and variables are interpretable. The estimated traffic flow satisfies fundamental flow conservation constraints and exhibits an increasing monotonic relationship with the estimated travel time. Using logit-based stochastic traffic assignment as the principle for routing flow behavior makes the model fully differentiable with respect to the model parameters. This property facilitates the application of computational graphs to learn parameters from vast amounts of spatiotemporal data. We also integrate neural networks and polynomial kernel functions to capture link flow interactions and enrich the mapping of traffic flows into travel times. MaTE also adds a destination choice model and a trip generation model that uses historical data on the number of trips generated by location. Experiments on synthetic data show that the model can accurately estimate travel time and traffic flow in out-of-sample links. Results obtained using real-world multi-source data from a large-scale transportation network suggest that MaTE outperforms data-driven benchmarks, especially in travel time estimation. The estimated parameters of MaTE are also informative about the hourly change in travel demand and supply characteristics of the transportation network.
Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a covariance model, encoding the spatial dependence. We relax the strong assumption of linearity and propose embedding neural networks directly within the traditional geostatistical models to accommodate non-linear mean functions while retaining all other advantages including use of Gaussian Processes to explicitly model the spatial covariance, enabling inference on the covariate effect through the mean and on the spatial dependence through the covariance, and offering predictions at new locations via kriging. We propose NN-GLS, a new neural network estimation algorithm for the non-linear mean in GP models that explicitly accounts for the spatial covariance through generalized least squares (GLS), the same loss used in the linear case. We show that NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. To our knowledge this is the first asymptotic consistency result for any neural network algorithm for spatial data. We demonstrate the methodology through simulated and real datasets.
Vecchia approximation has been widely used to accurately scale Gaussian-process (GP) inference to large datasets, by expressing the joint density as a product of conditional densities with small conditioning sets. We study fixed-domain asymptotic properties of Vecchia-based GP inference for a large class of covariance functions (including Mat\'ern covariances) with boundary conditioning. In this setting, we establish that consistency and asymptotic normality of maximum exact-likelihood estimators imply those of maximum Vecchia-likelihood estimators, and that exact GP prediction can be approximated accurately by Vecchia GP prediction, given that the size of conditioning sets grows polylogarithmically with the data size. Hence, Vecchia-based inference with quasilinear complexity is asymptotically equivalent to exact GP inference with cubic complexity. This also provides a general new result on the screening effect. Our findings are illustrated by numerical experiments, which also show that Vecchia approximation can be more accurate than alternative approaches such as covariance tapering and reduced-rank approximations.
We propose a novel algorithm for the support estimation of partially known Gaussian graphical models that incorporates prior information about the underlying graph. In contrast to classical approaches that provide a point estimate based on a maximum likelihood or a maximum a posteriori criterion using (simple) priors on the precision matrix, we consider a prior on the graph and rely on annealed Langevin diffusion to generate samples from the posterior distribution. Since the Langevin sampler requires access to the score function of the underlying graph prior, we use graph neural networks to effectively estimate the score from a graph dataset (either available beforehand or generated from a known distribution). Numerical experiments demonstrate the benefits of our approach.
We present new Neumann-Neumann algorithms based on a time domain decomposition applied to unconstrained parabolic optimal control problems. After a spatial semi-discretization, the Lagrange multiplier approach provides a coupled forward-backward optimality system, which can be solved using a time domain decomposition. Due to the forward-backward structure of the optimality system, nine variants can be found for the Neumann-Neumann algorithms. We analyze their convergence behavior and determine the optimal relaxation parameter for each algorithm. Our analysis reveals that the most natural algorithms are actually only good smoothers, and there are better choices which lead to efficient solvers. We illustrate our analysis with numerical experiments.
Most existing neural network-based approaches for solving stochastic optimal control problems using the associated backward dynamic programming principle rely on the ability to simulate the underlying state variables. However, in some problems, this simulation is infeasible, leading to the discretization of state variable space and the need to train one neural network for each data point. This approach becomes computationally inefficient when dealing with large state variable spaces. In this paper, we consider a class of this type of stochastic optimal control problems and introduce an effective solution employing multitask neural networks. To train our multitask neural network, we introduce a novel scheme that dynamically balances the learning across tasks. Through numerical experiments on real-world derivatives pricing problems, we prove that our method outperforms state-of-the-art approaches.
The computational demands of modern AI have spurred interest in optical neural networks (ONNs) which offer the potential benefits of increased speed and lower power consumption. However, current ONNs face various challenges,most significantly a limited calculation precision (typically around 4 bits) and the requirement for high-resolution signal format converters (digital-to-analogue conversions (DACs) and analogue-to-digital conversions (ADCs)). These challenges are inherent to their analog computing nature and pose significant obstacles in practical implementation. Here, we propose a digital-analog hybrid optical computing architecture for ONNs, which utilizes digital optical inputs in the form of binary words. By introducing the logic levels and decisions based on thresholding, the calculation precision can be significantly enhanced. The DACs for input data can be removed and the resolution of the ADCs can be greatly reduced. This can increase the operating speed at a high calculation precision and facilitate the compatibility with microelectronics. To validate our approach, we have fabricated a proof-of-concept photonic chip and built up a hybrid optical processor (HOP) system for neural network applications. We have demonstrated an unprecedented 16-bit calculation precision for high-definition image processing, with a pixel error rate (PER) as low as $1.8\times10^{-3}$ at an signal-to-noise ratio (SNR) of 18.2 dB. We have also implemented a convolutional neural network for handwritten digit recognition that shows the same accuracy as the one achieved by a desktop computer. The concept of the digital-analog hybrid optical computing architecture offers a methodology that could potentially be applied to various ONN implementations and may intrigue new research into efficient and accurate domain-specific optical computing architectures for neural networks.
We consider the community detection problem in a sparse $q$-uniform hypergraph $G$, assuming that $G$ is generated according to the Hypergraph Stochastic Block Model (HSBM). We prove that a spectral method based on the non-backtracking operator for hypergraphs works with high probability down to the generalized Kesten-Stigum detection threshold conjectured by Angelini et al. (2015). We characterize the spectrum of the non-backtracking operator for the sparse HSBM and provide an efficient dimension reduction procedure using the Ihara-Bass formula for hypergraphs. As a result, community detection for the sparse HSBM on $n$ vertices can be reduced to an eigenvector problem of a $2n\times 2n$ non-normal matrix constructed from the adjacency matrix and the degree matrix of the hypergraph. To the best of our knowledge, this is the first provable and efficient spectral algorithm that achieves the conjectured threshold for HSBMs with $r$ blocks generated according to a general symmetric probability tensor.
Typical pipelines for model geometry generation in computational biomedicine stem from images, which are usually considered to be at rest, despite the object being in mechanical equilibrium under several forces. We refer to the stress-free geometry computation as the reference configuration problem, and in this work we extend such a formulation to the theory of fully nonlinear poroelastic media. The main steps are (i) writing the equations in terms of the reference porosity and (ii) defining a time dependent problem whose steady state solution is the reference porosity. This problem can be computationally challenging as it can require several hundreds of iterations to converge, so we propose the use of Anderson acceleration to speed up this procedure. Our evidence shows that this strategy can reduce the number of iterations up to 80\%. In addition, we note that a primal formulation of the nonlinear mass conservation equations is not consistent due to the presence of second order derivatives of the displacement, which we alleviate through adequate mixed formulations. All claims are validated through numerical simulations in both idealized and realistic scenarios.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.