The method of Chernoff approximation is a powerful and flexible tool of functional analysis that in many cases allows expressing exp(tL) in terms of variable coefficients of linear differential operator L. In this paper we prove a theorem that allows us to apply this method to find the resolvent of operator L. We demonstrate this on the second order differential operator. As a corollary, we obtain a new representation of the solution of an inhomogeneous second order linear ordinary differential equation in terms of functions that are the coefficients of this equation playing the role of parameters for the problem.
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
The main focus of this paper is the study of efficient multigrid methods for large linear systems with a particular saddle-point structure. Indeed, when the system matrix is symmetric, but indefinite, the variational convergence theory that is usually used to prove multigrid convergence cannot be directly applied. However, different algebraic approaches analyze properly preconditioned saddle-point problems, proving convergence of the Two-Grid method. In particular, this is efficient when the blocks of the coefficient matrix possess a Toeplitz or circulant structure. Indeed, it is possible to derive sufficient conditions for convergence and provide optimal parameters for the preconditioning of the saddle-point problem in terms of the associated generating symbols. In this paper, we propose a symbol-based convergence analysis for problems that have a hidden block Toeplitz structure. Then, they can be investigated focusing on the properties of the associated generating function f, which consequently is a matrix-valued function with dimension depending on the block size of the problem. As numerical tests we focus on the matrix sequence stemming from the finite element approximation of the Stokes problem. We show the efficiency of the methods studying the hidden 9-by-9 block multilevel structure of the obtained matrix sequence. Moreover, we propose an efficient algebraic multigrid method with convergence rate independent of the matrix size. Finally, we present several numerical tests comparing the results with state-of-the-art strategies.
This paper is concerned with the convergence of a series associated with a certain version of the convexification method. That version has been recently developed by the research group of the first author for solving coefficient inverse problems. The convexification method aims to construct a globally convex Tikhonov-like functional with a Carleman Weight Function in it. In the previous works the construction of the strictly convex weighted Tikhonov-like functional assumes a truncated Fourier series (i.e. a finite series instead of an infinite one) for a function generated by the total wave field. In this paper we prove a convergence property for this truncated Fourier series approximation. More precisely, we show that the residual of the approximate PDE obtained by using the truncated Fourier series tends to zero in $L^{2}$ as the truncation index in the truncated Fourier series tends to infinity. The proof relies on a convergence result in the $H^{1}$-norm for a sequence of $L^{2}$-orthogonal projections on finite-dimensional subspaces spanned by elements of a special Fourier basis. However, due to the ill-posed nature of coefficient inverse problems, we cannot prove that the solution of that approximate PDE, which results from the minimization of that Tikhonov-like functional, converges to the correct solution.
We investigate the possibility of solving continuous non-convex optimization problems using a network of interacting quantum optical oscillators. We propose a native encoding of continuous variables in analog signals associated with the quadrature operators of a set of quantum optical modes. Optical coupling of the modes and noise introduced by vacuum fluctuations from external reservoirs or by weak measurements of the modes are used to optically simulate a diffusion process on a set of continuous random variables. The process is run sufficiently long for it to relax into the steady state of an energy potential defined on a continuous domain. As a first demonstration, we numerically benchmark solving box-constrained quadratic programming (BoxQP) problems using these settings. We consider delay-line and measurement-feedback variants of the experiment. Our benchmarking results demonstrate that in both cases the optical network is capable of solving BoxQP problems over three orders of magnitude faster than a state-of-the-art classical heuristic.
In-memory computing with resistive crossbar arrays has been suggested to accelerate deep-learning workloads in highly efficient manner. To unleash the full potential of in-memory computing, it is desirable to accelerate the training as well as inference for large deep neural networks (DNNs). In the past, specialized in-memory training algorithms have been proposed that not only accelerate the forward and backward passes, but also establish tricks to update the weight in-memory and in parallel. However, the state-of-the-art algorithm (Tiki-Taka version 2 (TTv2)) still requires near perfect offset correction and suffers from potential biases that might occur due to programming and estimation inaccuracies, as well as longer-term instabilities of the device materials. Here we propose and describe two new and improved algorithms for in-memory computing (Chopped-TTv2 (c-TTv2) and Analog Gradient Accumulation with Dynamic reference (AGAD)), that retain the same runtime complexity but correct for any remaining offsets using choppers. These algorithms greatly relax the device requirements and thus expanding the scope of possible materials potentially employed for such fast in-memory DNN training.
This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm (3P-SPIDER), designed to solve finite sum non-convex composite optimization. It is a stochastic Variable Metric Forward-Backward algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strategy with variance reduction to address the finite sum setting. We show that 3P-SPIDER extends some Stochastic preconditioned Gradient Descent-based algorithms and some Incremental Expectation Maximization algorithms to composite optimization and to the case the forward operator can not be computed in closed form. We also provide an explicit control of convergence in expectation of 3P-SPIDER, and study its complexity in order to satisfy the epsilon-approximate stationary condition. Our results are the first to combine the composite non-convex optimization setting, a variance reduction technique to tackle the finite sum setting by using a minibatch strategy and, to allow deterministic or random approximations of the preconditioned forward operator. Finally, through an application to inference in a logistic regression model with random effects, we numerically compare 3P-SPIDER to other stochastic forward-backward algorithms and discuss the role of some design parameters of 3P-SPIDER.
We propose a method of sufficient dimension reduction for functional data using distance covariance. We consider the case where the response variable is a scalar but the predictor is a random function. Our method has several advantages. It requires very mild conditions on the predictor, unlike the existing methods require the restrictive linear conditional mean assumption and constant covariance assumption. It also does not involve the inverse of the covariance operator which is not bounded. The link function between the response and the predictor can be arbitrary and our method maintains the model free advantage without estimating the link function. Moreover, our method is naturally applicable to sparse longitudinal data. We use functional principal component analysis with truncation as the regularization mechanism in the development. The justification for validity of the proposed method is provided and under some regularization conditions, statistical consistency of our estimator is established. Simulation studies and real data analysis are also provided to demonstrate the performance of our method.
In this paper, we propose an efficient quantum algorithm for solving nonlinear stochastic differential equations (SDE) via the associated Fokker-Planck equation (FPE). We discretize FPE in space and time using the Chang-Cooper scheme, and compute the solution of the resulting system of linear equations using the quantum linear systems algorithm. The Chang-Cooper scheme is second order accurate and satisfies conservativeness and positivity of the solution. We present detailed error and complexity analyses that demonstrate that our proposed quantum scheme, which we call the Quantum Linear Systems Chang-Cooper Algorithm (QLSCCA), computes the solution to the FPE within prescribed $\epsilon$ error bounds with polynomial dependence on state dimension $d$. Classical numerical methods scale exponentially with dimension, thus, our approach provides an \emph{exponential speed-up} over traditional approaches.
In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.