We consider a generic and explicit tamed Euler--Maruyama scheme for multidimensional time-inhomogeneous stochastic differential equations with multiplicative Brownian noise. The diffusion coefficient is uniformly elliptic, H\"older continuous and weakly differentiable in the spatial variables while the drift satisfies the Ladyzhenskaya--Prodi--Serrin condition, as considered by Krylov and R\"ockner (2005). In the discrete scheme, the drift is tamed by replacing it by an approximation. A strong rate of convergence of the scheme is provided in terms of the approximation error of the drift in a suitable and possibly very weak topology. A few examples of approximating drifts are discussed in detail. The parameters of the approximating drifts can vary and be fine-tuned to achieve the standard $1/2$-strong convergence rate with a logarithmic factor.
In this article, we deal with the efficient computation of the Wright function in the cases of interest for the expression of solutions of some fractional differential equations. The proposed algorithm is based on the inversion of the Laplace transform of a particular expression of the Wright function for which we discuss in detail the error analysis. We also present a code package that implements the algorithm proposed here in different programming languages. The analysis and implementation are accompanied by an extensive set of numerical experiments that validate both the theoretical estimates of the error and the applicability of the proposed method for representing the solutions of fractional differential equations.
A general class of KdV-type wave equations regularized with a convolution-type nonlocality in space is considered. The class differs from the class of the nonlinear nonlocal unidirectional wave equations previously studied by the addition of a linear convolution term involving third-order derivative. To solve the Cauchy problem we propose a semi-discrete numerical method based on a uniform spatial discretization, that is an extension of a previously published work of the present authors. We prove uniform convergence of the numerical method as the mesh size goes to zero. We also prove that the localization error resulting from localization to a finite domain is significantly less than a given threshold if the finite domain is large enough. To illustrate the theoretical results, some numerical experiments are carried out for the Rosenau-KdV equation, the Rosenau-BBM-KdV equation and a convolution-type integro-differential equation. The experiments conducted for three particular choices of the kernel function confirm the error estimates that we provide.
Several studies have shown the ability of natural gradient descent to minimize the objective function more efficiently than ordinary gradient descent based methods. However, the bottleneck of this approach for training deep neural networks lies in the prohibitive cost of solving a large dense linear system corresponding to the Fisher Information Matrix (FIM) at each iteration. This has motivated various approximations of either the exact FIM or the empirical one. The most sophisticated of these is KFAC, which involves a Kronecker-factored block diagonal approximation of the FIM. With only a slight additional cost, a few improvements of KFAC from the standpoint of accuracy are proposed. The common feature of the four novel methods is that they rely on a direct minimization problem, the solution of which can be computed via the Kronecker product singular value decomposition technique. Experimental results on the three standard deep auto-encoder benchmarks showed that they provide more accurate approximations to the FIM. Furthermore, they outperform KFAC and state-of-the-art first-order methods in terms of optimization speed.
We present substantially generalized and improved quantum algorithms over prior work for inhomogeneous linear and nonlinear ordinary differential equations (ODE). In Berry et al., (2017), a quantum algorithm for a certain class of linear ODEs is given, where the matrix involved needs to be diagonalizable. The quantum algorithm for linear ODEs presented here extends to many classes of non-diagonalizable matrices. The algorithm here can also be exponentially faster for certain classes of diagonalizable matrices. Our linear ODE algorithm is then applied to nonlinear differential equations using Carleman linearization (an approach taken recently by us in Liu et al., (2021)). The improvement over that result is two-fold. First, we obtain an exponentially better dependence on error. This kind of logarithmic dependence on error has also been achieved by Xue et al., (2021), but only for homogeneous nonlinear equations. Second, the present algorithm can handle any sparse, invertible matrix (that models dissipation) if it has a negative log-norm (including non-diagonalizable matrices), whereas Liu et al., (2021) and Xue et al., (2021) additionally require normality.
Singular source terms in sub-diffusion equations may lead to the unboundedness of solutions, which will bring a severe reduction of convergence order of existing time-stepping schemes. In this work, we propose two efficient time-stepping schemes for solving sub-diffusion equations with a class of source terms mildly singular in time. One discretization is based on the Gr{\"u}nwald-Letnikov and backward Euler methods. First-order error estimate with respect to time is rigorously established for singular source terms and nonsmooth initial data. The other scheme derived from the second-order backward differentiation formula (BDF) is proved to possess second-order accuracy in time. Further, piecewise linear finite element and lumped mass finite element discretizations in space are applied and analyzed rigorously. Numerical investigations confirm our theoretical results.
Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.
This paper studies decentralized convex-concave minimax optimization problems of the form $\min_x\max_y f(x,y) \triangleq\frac{1}{m}\sum_{i=1}^m f_i(x,y)$, where $m$ is the number of agents and each local function can be written as $f_i(x,y)=\frac{1}{n}\sum_{j=1}^n f_{i,j}(x,y)$. We propose a novel decentralized optimization algorithm, called multi-consensus stochastic variance reduced extragradient, which achieves the best known stochastic first-order oracle (SFO) complexity for this problem. Specifically, each agent requires $\mathcal O((n+\kappa\sqrt{n})\log(1/\varepsilon))$ SFO calls for strongly-convex-strongly-concave problem and $\mathcal O((n+\sqrt{n}L/\varepsilon)\log(1/\varepsilon))$ SFO call for general convex-concave problem to achieve $\varepsilon$-accurate solution in expectation, where $\kappa$ is the condition number and $L$ is the smoothness parameter. The numerical experiments show the proposed method performs better than baselines.
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
In this paper, we focus on the construction of a hybrid scheme for the approximation of non-Maxwellian kinetic models with uncertainties. In the context of multiagent systems, the introduction of a kernel at the kinetic level is useful to avoid unphysical interactions. The methods here proposed, combine a direct simulation Monte Carlo (DSMC) in the phase space together with stochastic Galerkin (sG) methods in the random space. The developed schemes preserve the main physical properties of the solution together with accuracy in the random space. The consistency of the methods is tested with respect to surrogate Fokker-Planck models that can be obtained in the quasi-invariant regime of parameters. Several applications of the schemes to non-Maxwellian models of multiagent systems are reported.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.