Solving high-dimensional partial differential equations is a recurrent challenge in economics, science and engineering. In recent years, a great number of computational approaches have been developed, most of them relying on a combination of Monte Carlo sampling and deep learning based approximation. For elliptic and parabolic problems, existing methods can broadly be classified into those resting on reformulations in terms of $\textit{backward stochastic differential equations}$ (BSDEs) and those aiming to minimize a regression-type $L^2$-error ($\textit{physics-informed neural networks}$, PINNs). In this paper, we review the literature and suggest a methodology based on the novel $\textit{diffusion loss}$ that interpolates between BSDEs and PINNs. Our contribution opens the door towards a unified understanding of numerical approaches for high-dimensional PDEs, as well as for implementations that combine the strengths of BSDEs and PINNs. The diffusion loss furthermore bears close similarities to $\textit{(least squares) temporal difference}$ objectives found in reinforcement learning. We also discuss eigenvalue problems and perform extensive numerical studies, including calculations of the ground state for nonlinear Schr\"odinger operators and committor functions relevant in molecular dynamics.
Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics such as the likelihood and average time of events (predictions). Here we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a data set of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.
The paper focuses on a new error analysis of a class of mixed FEMs for stationary incompressible magnetohydrodynamics with the standard inf-sup stable velocity-pressure space pairs to Navier-Stokes equations and the N\'ed\'elec's edge element for the magnetic field. The methods have been widely used in various numerical simulations in the last several decades, while the existing analysis is not optimal due to the strong coupling of system and the pollution of the lower-order N\'ed\'elec's edge approximation in analysis. In terms of a newly modified Maxwell projection we establish new and optimal error estimates. In particular, we prove that the method based on the commonly-used Taylor-Hood/lowest-order N\'ed\'elec's edge element is efficient and the method provides the second-order accuracy for numerical velocity. Two numerical examples for the problem in both convex and nonconvex polygonal domains are presented. Numerical results confirm our theoretical analysis.
While risk-neutral reinforcement learning has shown experimental success in a number of applications, it is well-known to be non-robust with respect to noise and perturbations in the parameters of the system. For this reason, risk-sensitive reinforcement learning algorithms have been studied to introduce robustness and sample efficiency, and lead to better real-life performance. In this work, we introduce new model-free risk-sensitive reinforcement learning algorithms as variations of widely-used Policy Gradient algorithms with similar implementation properties. In particular, we study the effect of exponential criteria on the risk-sensitivity of the policy of a reinforcement learning agent, and develop variants of the Monte Carlo Policy Gradient algorithm and the online (temporal-difference) Actor-Critic algorithm. Analytical results showcase that the use of exponential criteria generalize commonly used ad-hoc regularization approaches. The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
We present an extension of the linear sampling method for solving the sound-soft inverse acoustic scattering problem with randomly distributed point sources. The theoretical justification of our sampling method is based on the Helmholtz--Kirchhoff identity, the cross-correlation between measurements, and the volume and imaginary near-field operators, which we introduce and analyze. Implementations in MATLAB using boundary elements, the SVD, Tikhonov regularization, and Morozov's discrepancy principle are also discussed. We demonstrate the robustness and accuracy of our algorithms with several numerical experiments in two dimensions.
Recent progress was made in characterizing the generalization error of gradient methods for general convex loss by the learning theory community. In this work, we focus on how training longer might affect generalization in smooth stochastic convex optimization (SCO) problems. We first provide tight lower bounds for general non-realizable SCO problems. Furthermore, existing upper bound results suggest that sample complexity can be improved by assuming the loss is realizable, i.e. an optimal solution simultaneously minimizes all the data points. However, this improvement is compromised when training time is long and lower bounds are lacking. Our paper examines this observation by providing excess risk lower bounds for gradient descent (GD) and stochastic gradient descent (SGD) in two realizable settings: 1) realizable with $T = O(n)$, and (2) realizable with $T = \Omega(n)$, where $T$ denotes the number of training iterations and $n$ is the size of the training dataset. These bounds are novel and informative in characterizing the relationship between $T$ and $n$. In the first small training horizon case, our lower bounds almost tightly match and provide the first optimal certificates for the corresponding upper bounds. However, for the realizable case with $T = \Omega(n)$, a gap exists between the lower and upper bounds. We provide a conjecture to address this problem, that the gap can be closed by improving upper bounds, which is supported by our analyses in one-dimensional and linear regression scenarios.
This paper is dedicated to achieving scalable relative state estimation using inter-robot Euclidean distance measurements. We consider equipping robots with distance sensors and focus on the optimization problem underlying relative state estimation in this setup. We reveal the commonality between this problem and the coordinates realization problem of a sensor network. Based on this insight, we propose an effective unconstrained optimization model to infer the relative states among robots. To work on this model in a distributed manner, we propose an efficient and scalable optimization algorithm with the classical block coordinate descent method as its backbone. This algorithm exactly solves each block update subproblem with a closed-form solution while ensuring convergence. Our results pave the way for distance measurements-based relative state estimation in large-scale multi-robot systems.
The ability to compose code in a modular fashion is important to the construction of large programs. In the logic programming setting, it is desirable that such capabilities be realized through logic-based devices. We describe an approach for doing this here. In our scheme a module corresponds to a block of code whose external view is mediated by a signature. Thus, signatures impose a form of hiding that is explained logically via existential quantifications over predicate, function and constant names. Modules interact through the mechanism of accumulation that translates into conjoining the clauses in them while respecting the scopes of existential quantifiers introduced by signatures. We show that this simple device for statically structuring name spaces suffices for realizing features related to code scoping for which the dynamic control of predicate definitions was earlier considered necessary. The module capabilities we present have previously been implemented via the compile-time inlining of accumulated modules. This approach does not support separate compilation. We redress this situation by showing how each distinct module can be compiled separately and inlining can be realized by a later, complementary and equally efficient linking phase.
Blind source separation (BSS) aims to recover an unobserved signal $S$ from its mixture $X=f(S)$ under the condition that the effecting transformation $f$ is invertible but unknown. As this is a basic problem with many practical applications, a fundamental issue is to understand how the solutions to this problem behave when their supporting statistical prior assumptions are violated. In the classical context of linear mixtures, we present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$. Modelling $S$ as a multidimensional stochastic process, we introduce an informative topology on the space of possible causes underlying a mixture $X$, and show that the behaviour of a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees with respect to this topology. This allows for a flexible and convenient quantification of general model uncertainty scenarios and amounts to the first comprehensive robustness framework for BSS. Our approach is entirely constructive, and we demonstrate its utility with novel theoretical guarantees for a number of statistical applications.
Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.
The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.