Iterative sketching and sketch-and-precondition are randomized algorithms used for solving overdetermined linear least-squares problems. When implemented in exact arithmetic, these algorithms produce high-accuracy solutions to least-squares problems faster than standard direct methods based on QR factorization. Recently, Meier, Nakatsukasa, Townsend, and Webb demonstrated numerical instabilities in a version of sketch-and-precondition in floating point arithmetic (arXiv:2302.07202). The work of Meier et al. raises the question: Is there a randomized least-squares solver that is both fast and stable? This paper resolves this question in the affirmative by proving that iterative sketching, appropriately implemented, is forward stable. Numerical experiments confirm the theoretical findings, demonstrating that iterative sketching is stable and faster than QR-based solvers for large problem instances.
We propose a method for computing the Lyapunov exponents of renewal equations (delay equations of Volterra type) and of coupled systems of renewal and delay differential equations. The method consists in the reformulation of the delay equation as an abstract differential equation, the reduction of the latter to a system of ordinary differential equations via pseudospectral collocation, and the application of the standard discrete QR method. The effectiveness of the method is shown experimentally and a MATLAB implementation is provided.
We construct the first rigorously justified probabilistic algorithm for recovering the solution operator of a hyperbolic partial differential equation (PDE) in two variables from input-output training pairs. The primary challenge of recovering the solution operator of hyperbolic PDEs is the presence of characteristics, along which the associated Green's function is discontinuous. Therefore, a central component of our algorithm is a rank detection scheme that identifies the approximate location of the characteristics. By combining the randomized singular value decomposition with an adaptive hierarchical partition of the domain, we construct an approximant to the solution operator using $O(\Psi_\epsilon^{-1}\epsilon^{-7}\log(\Xi_\epsilon^{-1}\epsilon^{-1}))$ input-output pairs with relative error $O(\Xi_\epsilon^{-1}\epsilon)$ in the operator norm as $\epsilon\to0$, with high probability. Here, $\Psi_\epsilon$ represents the existence of degenerate singular values of the solution operator, and $\Xi_\epsilon$ measures the quality of the training data. Our assumptions on the regularity of the coefficients of the hyperbolic PDE are relatively weak given that hyperbolic PDEs do not have the ``instantaneous smoothing effect'' of elliptic and parabolic PDEs, and our recovery rate improves as the regularity of the coefficients increases.
We present and analyze an algorithm designed for addressing vector-valued regression problems involving possibly infinite-dimensional input and output spaces. The algorithm is a randomized adaptation of reduced rank regression, a technique to optimally learn a low-rank vector-valued function (i.e. an operator) between sampled data via regularized empirical risk minimization with rank constraints. We propose Gaussian sketching techniques both for the primal and dual optimization objectives, yielding Randomized Reduced Rank Regression (R4) estimators that are efficient and accurate. For each of our R4 algorithms we prove that the resulting regularized empirical risk is, in expectation w.r.t. randomness of a sketch, arbitrarily close to the optimal value when hyper-parameteres are properly tuned. Numerical expreriments illustrate the tightness of our bounds and show advantages in two distinct scenarios: (i) solving a vector-valued regression problem using synthetic and large-scale neuroscience datasets, and (ii) regressing the Koopman operator of a nonlinear stochastic dynamical system.
Large machine learning models are revolutionary technologies of artificial intelligence whose bottlenecks include huge computational expenses, power, and time used both in the pre-training and fine-tuning process. In this work, we show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms, scaling as O(T^2 polylog(n)), where n is the size of the models and T is the number of iterations in the training, as long as the models are both sufficiently dissipative and sparse, with small learning rates. Based on earlier efficient quantum algorithms for dissipative differential equations, we find and prove that similar algorithms work for (stochastic) gradient descent, the primary algorithm for machine learning. In practice, we benchmark instances of large machine learning models from 7 million to 103 million parameters. We find that, in the context of sparse training, a quantum enhancement is possible at the early stage of learning after model pruning, motivating a sparse parameter download and re-upload scheme. Our work shows solidly that fault-tolerant quantum algorithms could potentially contribute to most state-of-the-art, large-scale machine-learning problems.
We prove the convergence of meshfree method for solving the elliptic Monge-Ampere equation with Dirichlet boundary on the bounded domain. L2 error is obtained based on the kernel-based trial spaces generated by the compactly supported radial basis functions. We obtain the convergence result when the testing discretization is finer than the trial discretization. The convergence rate depend on the regularity of the solution, the smoothness of the computing domain, and the approximation of scaled kernel-based spaces. The presented convergence theory covers a wide range of kernel-based trial spaces including stationary approximation and non-stationary approximation. An extension to non-Dirichlet boundary condition is in a forthcoming paper.
We study operator - or noncommutative - variants of constraint satisfaction problems (CSPs). These higher-dimensional variants are a core topic of investigation in quantum information, where they arise as nonlocal games and entangled multiprover interactive proof systems (MIP*). The idea of higher-dimensional relaxations of CSPs is also important in the classical literature. For example since the celebrated work of Goemans and Williamson on Max-Cut, higher dimensional vector relaxations have been central in the design of approximation algorithms for classical CSPs. We introduce a framework for designing approximation algorithms for noncommutative CSPs. Prior to this work Max-$2$-Lin$(k)$ was the only family of noncommutative CSPs known to be efficiently solvable. This work is the first to establish approximation ratios for a broader class of noncommutative CSPs. In the study of classical CSPs, $k$-ary decision variables are often represented by $k$-th roots of unity, which generalise to the noncommutative setting as order-$k$ unitary operators. In our framework, using representation theory, we develop a way of constructing unitary solutions from SDP relaxations, extending the pioneering work of Tsirelson on XOR games. Then, we introduce a novel rounding scheme to transform these solutions to order-$k$ unitaries. Our main technical innovation here is a theorem guaranteeing that, for any set of unitary operators, there exists a set of order-$k$ unitaries that closely mimics it. As an integral part of the rounding scheme, we prove a random matrix theory result that characterises the distribution of the relative angles between eigenvalues of random unitaries using tools from free probability.
This paper presents an asymptotic preserving (AP) implicit-explicit (IMEX) scheme for solving the quantum BGK equation using the Hermite spectral method. The distribution function is expanded in a series of Hermite polynomials, with the Gaussian function serving as the weight function. The main challenge in this numerical scheme lies in efficiently expanding the quantum Maxwellian with the Hermite basis functions. To overcome this, we simplify the problem to the calculation of polylogarithms and propose an efficient algorithm to handle it, utilizing the Gauss-Hermite quadrature. Several numerical simulations, including a spatially 2D lid-driven cavity flow, demonstrate the AP property and remarkable efficiency of this method.
We define a model of predicate logic in which every term and predicate, open or closed, has an absolute denotation independently of a valuation of the variables. For each variable a, the domain of the model contains an element [[a]] which is the denotation of the term a (which is also a variable symbol). Similarly, the algebra interpreting predicates in the model directly interprets open predicates. Because of this models must also incorporate notions of substitution and quantification. These notions are axiomatic, and need not be applied only to sets of syntax. We prove soundness and show how every 'ordinary' model (i.e. model based on sets and valuations) can be translated to one of our nominal models, and thus also prove completeness.
Fractional calculus with respect to function $\psi$, also named as $\psi$-fractional calculus, generalizes the Hadamard and the Riemann-Liouville fractional calculi, which causes challenge in numerical treatment. In this paper we study spectral-type methods using mapped Jacobi functions (MJFs) as basis functions and obtain efficient algorithms to solve $\psi$-fractional differential equations. In particular, we setup the Petrov-Galerkin spectral method and spectral collocation method for initial and boundary value problems involving $\psi$-fractional derivatives. We develop basic approximation theory for the MJFs and conduct the error estimates of the derived methods. We also establish a recurrence relation to evaluate the collocation differentiation matrix for implementing the spectral collocation algorithm. Numerical examples confirm the theoretical results and demonstrate the effectiveness of the spectral and collocation methods.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.