We exploit the relationship between the stochastic Koopman operator and the Kolmogorov backward equation to construct importance sampling schemes for stochastic differential equations. Specifically, we propose using eigenfunctions of the stochastic Koopman operator to approximate the Doob transform for an observable of interest (e.g., associated with a rare event) which in turn yields an approximation of the corresponding zero-variance importance sampling estimator. Our approach is broadly applicable and systematic, treating non-normal systems, non-gradient systems, and systems with oscillatory dynamics or rank-deficient noise in a common framework. In nonlinear settings where the stochastic Koopman eigenfunctions cannot be derived analytically, we use dynamic mode decomposition (DMD) methods to compute them numerically, but the framework is agnostic to the particular numerical method employed. Numerical experiments demonstrate that even coarse approximations of a few eigenfunctions, where the latter are built from non-rare trajectories, can produce effective importance sampling schemes for rare events.
We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.
We study the stochastic $p$-Laplace system in a bounded domain. We propose two new space-time discretizations based on the approximation of time-averaged values. We establish linear convergence in space and $1/2$ convergence in time. Additionally, we provide a sampling algorithm to construct the necessary random input in an efficient way. The theoretical error analysis is complemented by numerical experiments.
Stochastic partial differential equations (SPDEs) are the mathematical tool of choice for modelling spatiotemporal PDE-dynamics under the influence of randomness. Based on the notion of mild solution of an SPDE, we introduce a novel neural architecture to learn solution operators of PDEs with (possibly stochastic) forcing from partially observed data. The proposed Neural SPDE model provides an extension to two popular classes of physics-inspired architectures. On the one hand, it extends Neural CDEs and variants -- continuous-time analogues of RNNs -- in that it is capable of processing incoming sequential information arriving irregularly in time and observed at arbitrary spatial resolutions. On the other hand, it extends Neural Operators -- generalizations of neural networks to model mappings between spaces of functions -- in that it can parameterize solution operators of SPDEs depending simultaneously on the initial condition and a realization of the driving noise. By performing operations in the spectral domain, we show how a Neural SPDE can be evaluated in two ways, either by calling an ODE solver (emulating a spectral Galerkin scheme), or by solving a fixed point problem. Experiments on various semilinear SPDEs, including the stochastic Navier-Stokes equations, demonstrate how the Neural SPDE model is capable of learning complex spatiotemporal dynamics in a resolution-invariant way, with better accuracy and lighter training data requirements compared to alternative models, and up to 3 orders of magnitude faster than traditional solvers.
Backward stochastic differential equations (BSDEs) appear in numeruous applications. Classical approximation methods suffer from the curse of dimensionality and deep learning-based approximation methods are not known to converge to the BSDE solution. Recently, Hutzenthaler et al. (arXiv:2108.10602) introduced a new approximation method for BSDEs whose forward diffusion is Brownian motion and proved that this method converges with essentially optimal rate without suffering from the curse of dimensionality. The central object of this article is to extend this result to general forward diffusions. The main challenge is that we need to establish convergence in temporal-spatial H\"older norms since the forward diffusion cannot be sampled exactly in general.
We propose a stochastic conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully increasing the batch size over the course of the algorithm's execution, which leads to computing full gradients. In contrast, the proposed method, equipped with a stochastic average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques. In applications we put special emphasis on problems with a large number of separable constraints. Such problems are prevalent among semidefinite programming (SDP) formulations arising in machine learning and theoretical computer science. We provide numerical experiments on matrix completion, unsupervised clustering, and sparsest-cut SDPs.
Randomized Maximum Likelihood (RML) is an approximate posterior sampling methodology, widely used in Bayesian inverse problems with complex forward models, particularly in petroleum engineering applications. The procedure involves solving a multi-objective optimization problem, which can be challenging in high-dimensions and when there are constraints on computational costs. We propose a new methodology for tackling the RML optimization problem based on the high-dimensional Bayesian optimization literature. By sharing data between the different objective functions, we are able to implement RML at a greatly reduced computational cost. We demonstrate the benefits of our methodology in comparison with the solutions obtained by alternative optimization methods on a variety of synthetic and real-world problems, including medical and fluid dynamics applications. Furthermore, we show that the samples produced by our method cover well the high-posterior density regions in all of the experiments.
The Koopman operator is beneficial for analyzing nonlinear and stochastic dynamics; it is linear but infinite-dimensional, and it governs the evolution of observables. The extended dynamic mode decomposition (EDMD) is one of the famous methods in the Koopman operator approach. The EDMD employs a data set of snapshot pairs and a specific dictionary to evaluate an approximation for the Koopman operator, i.e., the Koopman matrix. In this study, we focus on stochastic differential equations, and a method to obtain the Koopman matrix is proposed. The proposed method does not need any data set, which employs the original system equations to evaluate some of the targeted elements of the Koopman matrix. The proposed method comprises combinatorics, an approximation of the resolvent, and extrapolations. Comparisons with the EDMD are performed for a noisy van der Pol system. The proposed method yields reasonable results even in cases wherein the EDMD exhibits a slow convergence behavior.
The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.
We introduce a novel methodology for particle filtering in dynamical systems where the evolution of the signal of interest is described by a SDE and observations are collected instantaneously at prescribed time instants. The new approach includes the discretisation of the SDE and the design of efficient particle filters for the resulting discrete-time state-space model. The discretisation scheme converges with weak order 1 and it is devised to create a sequential dependence structure along the coordinates of the discrete-time state vector. We introduce a class of space-sequential particle filters that exploits this structure to improve performance when the system dimension is large. This is numerically illustrated by a set of computer simulations for a stochastic Lorenz 96 system with additive noise. The new space-sequential particle filters attain approximately constant estimation errors as the dimension of the Lorenz 96 system is increased, with a computational cost that increases polynomially, rather than exponentially, with the system dimension. Besides the new numerical scheme and particle filters, we provide in this paper a general framework for discrete-time filtering in continuous-time dynamical systems described by a SDE and instantaneous observations. Provided that the SDE is discretised using a weakly-convergent scheme, we prove that the marginal posterior laws of the resulting discrete-time state-space model converge to the posterior marginal posterior laws of the original continuous-time state-space model under a suitably defined metric. This result is general and not restricted to the numerical scheme or particle filters specifically studied in this manuscript.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.