In geostatistical problems with massive sample size, Gaussian processes (GP) can be approximated using sparse directed acyclic graphs to achieve scalable $O(n)$ computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and may result in sensitivity to graph choice. We address these issues by introducing radial neighbors Gaussian processes and corresponding theoretical guarantees. We propose to approximate GPs using a sparse directed acyclic graph in which a directed edge connects every location to all of its neighbors within a predetermined radius. Using our novel construction, we show that one can accurately approximate a Gaussian process in Wasserstein-2 distance, with an error rate determined by the approximation radius, the spatial covariance function, and the spatial dispersion of samples. Our method is also insensitive to specific graphical model choice. We offer further empirical validation of our approach via applications on simulated and real world data showing state-of-the-art performance in posterior inference of spatial random effects.
Gaussian boson sampling, a computational model that is widely believed to admit quantum supremacy, has already been experimentally demonstrated to surpasses the classical simulation capabilities of even with the most powerful supercomputers today. However, whether the current approach limited by photon loss and noise in such experiments prescribes a scalable path to quantum advantage is an open question. For example, random circuit sampling with constant noise per gate was recently shown not to be a scalable approach to achieve quantum supremacy, although simulating intermediate scale systems is still difficult. To understand the effect of photon loss on the scability of Gaussian boson sampling, we use a tensor network algorithm with $U(1)$ symmetry to examine the asymptotic operator entanglement entropy scaling, which relates to the simulation complexity. We develop a custom-built algorithm that significantly reduces the computational time with state-of-the-art hardware accelerators, enabling simulations of much larger systems. With this capability, we observe, for Gaussian boson sampling, the crucial $N_\text{out}\propto\sqrt{N}$ scaling of the number of surviving photons in the number of input photons that marks the boundary between efficient and inefficient classical simulation. We further theoretically show that this should be general for other input states.
Energy modelling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific configurations, neither are they suitable for static energy consumption estimation. This paper introduces a set of comprehensive energy models for Arm's Cortex-M0 processor, ready to support energy-aware development of edge computing applications using either profiling- or static-analysis-based energy consumption estimation. We use a commercially representative physical platform together with a custom modified Instruction Set Simulator to obtain the physical data and system state markers used to generate the models. The models account for different processor configurations which all have a significant impact on the execution time and energy consumption of edge computing applications. Unlike existing works, which target a very limited set of applications, all developed models are generated and validated using a very wide range of benchmarks from a variety of emerging IoT application areas, including machine learning and have a prediction error of less than 5%.
This paper introduces a new simulation-based inference procedure to model and sample from multi-dimensional probability distributions given access to i.i.d.\ samples, circumventing the usual approaches of explicitly modeling the density function or designing Markov chain Monte Carlo. Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference. Our RGM sampler can also estimate optimal alignments between two heterogeneous metric measure spaces $(\cX, \mu, c_{\cX})$ and $(\cY, \nu, c_{\cY})$ from empirical data sets, with estimated maps that approximately push forward one measure $\mu$ to the other $\nu$, and vice versa. We study the analytic properties of the RGM distance and derive that under mild conditions, RGM equals the classic Gromov-Wasserstein distance. Curiously, drawing a connection to Brenier's polar factorization, we show that the RGM sampler induces bias towards strong isomorphism with proper choices of $c_{\cX}$ and $c_{\cY}$. Statistical rate of convergence, representation, and optimization questions regarding the induced sampler are studied. Synthetic and real-world examples showcasing the effectiveness of the RGM sampler are also demonstrated.
Agent-Based Models are very useful for simulation of physical or social processes, such as the spreading of a pandemic in a city. Such models proceed by specifying the behavior of individuals (agents) and their interactions, and parameterizing the process of infection based on such interactions based on the geography and demography of the city. However, such models are computationally very expensive, and the complexity is often linear in the total number of agents. This seriously limits the usage of such models for simulations, which often have to be run hundreds of times for policy planning and even model parameter estimation. An alternative is to develop an emulator, a surrogate model that can predict the Agent-Based Simulator's output based on its initial conditions and parameters. In this paper, we discuss a Deep Learning model based on Dilated Convolutional Neural Network that can emulate such an agent based model with high accuracy. We show that use of this model instead of the original Agent-Based Model provides us major gains in the speed of simulations, allowing much quicker calibration to observations, and more extensive scenario analysis. The models we consider are spatially explicit, as the locations of the infected individuals are simulated instead of the gross counts. Another aspect of our emulation framework is its divide-and-conquer approach that divides the city into several small overlapping blocks and carries out the emulation in them parallelly, after which these results are merged together. This ensures that the same emulator can work for a city of any size, and also provides significant improvement of time complexity of the emulator, compared to the original simulator.
We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting. In such a setting, an algorithm can sample examples one at a time but, in order to limit overhead costs, is only able to update its state (i.e. further train model weights) once a large enough batch of examples is selected. Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches. IWeS admits significant performance improvement compared to other subset selection algorithms for seven publicly available datasets. Additionally, it is competitive in an active learning setting, where the label information is not available at selection time. We also provide an initial theoretical analysis to support our importance weighting approach, proving generalization and sampling rate bounds.
Learning in neural networks is often framed as a problem in which targeted error signals are directly propagated to parameters and used to produce updates that induce more optimal network behaviour. Backpropagation of error (BP) is an example of such an approach and has proven to be a highly successful application of stochastic gradient descent to deep neural networks. We propose constrained parameter inference (COPI) as a new principle for learning. The COPI approach assumes that learning can be set up in a manner where parameters infer their own values based upon observations of their local neuron activities. We find that this estimation of network parameters is possible under the constraints of decorrelated neural inputs and top-down perturbations of neural states for credit assignment. We show that the decorrelation required for COPI allows learning at extremely high learning rates, competitive with that of adaptive optimizers, as used by BP. We further demonstrate that COPI affords a new approach to feature analysis and network compression. Finally, we argue that COPI may shed new light on learning in biological networks given the evidence for decorrelation in the brain.
In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments. The agent then must be able to approximately solve any planning task defined over any environment in the original set, relying on the above interactions only. Can we design a provably efficient algorithm that achieves this ambitious goal of systematic generalization? In this paper, we give a partially positive answer to this question. First, we provide a tractable formulation of systematic generalization by employing a causal viewpoint. Then, under specific structural assumptions, we provide a simple learning algorithm that guarantees any desired planning error up to an unavoidable sub-optimality term, while showcasing a polynomial sample complexity.
Covariate measurement error in nonparametric regression is a common problem in nutritional epidemiology and geostatistics, and other fields. Over the last two decades, this problem has received substantial attention in the frequentist literature. Bayesian approaches for handling measurement error have only been explored recently and are surprisingly successful, although the lack of a proper theoretical justification regarding the asymptotic performance of the estimators. By specifying a Gaussian process prior on the regression function and a Dirichlet process Gaussian mixture prior on the unknown distribution of the unobserved covariates, we show that the posterior distribution of the regression function and the unknown covariates density attain optimal rates of contraction adaptively over a range of H\"{o}lder classes, up to logarithmic terms. This improves upon the existing classical frequentist results which require knowledge of the smoothness of the underlying function to deliver optimal risk bounds. We also develop a novel surrogate prior for approximating the Gaussian process prior that leads to efficient computation and preserves the covariance structure, thereby facilitating easy prior elicitation. We demonstrate the empirical performance of our approach and compare it with competitors in a wide range of simulation experiments and a real data example.
The geometric optimisation of crystal structures is a procedure widely used in Chemistry that changes the geometrical placement of the particles inside a structure. It is called structural relaxation and constitutes a local minimization problem with a non-convex objective function whose domain complexity increases along with the number of particles involved. In this work we study the performance of the two most popular first order optimisation methods, Gradient Descent and Conjugate Gradient, in structural relaxation. The respective pseudocodes can be found in Section 6. Although frequently employed, there is a lack of their study in this context from an algorithmic point of view. In order to accurately define the problem, we provide a thorough derivation of all necessary formulae related to the crystal structure energy function and the function's differentiation. We run each algorithm in combination with a constant step size, which provides a benchmark for the methods' analysis and direct comparison. We also design dynamic step size rules and study how these improve the two algorithms' performance. Our results show that there is a trade-off between convergence rate and the possibility of an experiment to succeed, hence we construct a function to assign utility to each method based on our respective preference. The function is built according to a recently introduced model of preference indication concerning algorithms with deadline and their run time. Finally, building on all our insights from the experimental results, we provide algorithmic recipes that best correspond to each of the presented preferences and select one recipe as the optimal for equally weighted preferences.
We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this spatial statistics view, we solve forward and inverse problems for parametric PDEs in a way that leads to the construction of Gaussian process models of solution fields. The implementation of these random grids poses a unique set of challenges for inverse physics informed deep learning frameworks and we propose a new architecture called Grid Invariant Convolutional Networks (GICNets) to overcome these challenges. We further show how to incorporate noisy data in a principled manner into our physics informed model to improve predictions for problems where data may be available but whose measurement location does not coincide with any fixed mesh or grid. The proposed method is tested on a nonlinear Poisson problem, Burgers equation, and Navier-Stokes equations, and we provide extensive numerical comparisons. We demonstrate significant computational advantages over current physics informed neural learning methods for parametric PDEs while improving the predictive capabilities and flexibility of these models.