We consider the problem of solving linear least squares problems in a framework where only evaluations of the linear map are possible. We derive randomized methods that do not need any other matrix operations than forward evaluations, especially no evaluation of the adjoint map is needed. Our method is motivated by the simple observation that one can get an unbiased estimate of the application of the adjoint. We show convergence of the method and then derive a more efficient method that uses an exact linesearch. This method, called random descent, resembles known methods in other context and has the randomized coordinate descent method as special case. We provide convergence analysis of the random descent method emphasizing the dependence on the underlying distribution of the random vectors. Furthermore we investigate the applicability of the method in the context of ill-posed inverse problems and show that the method can have beneficial properties when the unknown solution is rough. We illustrate the theoretical findings in numerical examples. One particular result is that the random descent method actually outperforms established transposed-free methods (TFQMR and CGS) in examples.
We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.
We consider the problem of finite-time identification of linear dynamical systems from $T$ samples of a single trajectory. Recent results have predominantly focused on the setup where no structural assumption is made on the system matrix $A^* \in \mathbb{R}^{n \times n}$, and have consequently analyzed the ordinary least squares (OLS) estimator in detail. We assume prior structural information on $A^*$ is available, which can be captured in the form of a convex set $\mathcal{K}$ containing $A^*$. For the solution of the ensuing constrained least squares estimator, we derive non-asymptotic error bounds in the Frobenius norm that depend on the local size of $\mathcal{K}$ at $A^*$. To illustrate the usefulness of these results, we instantiate them for three examples, namely when (i) $A^*$ is sparse and $\mathcal{K}$ is a suitably scaled $\ell_1$ ball; (ii) $\mathcal{K}$ is a subspace; (iii) $\mathcal{K}$ consists of matrices each of which is formed by sampling a bivariate convex function on a uniform $n \times n$ grid (convex regression). In all these situations, we show that $A^*$ can be reliably estimated for values of $T$ much smaller than what is needed for the unconstrained setting.
The matrix factor model has drawn growing attention for its advantage in achieving two-directional dimension reduction simultaneously for matrix-structured observations. In this paper, we propose a simple iterative least squares algorithm for matrix factor models, in contrast to the Principal Component Analysis (PCA)-based methods in the literature. In detail, we first propose to estimate the latent factor matrices by projecting the observations with two deterministic weight matrices, which are chosen to diversify away the idiosyncratic components. We show that the inferences on factors are still asymptotically valid even if we overestimate both the row/column factor numbers. We then estimate the row/column loading matrices by minimizing the squared loss function under certain identifiability conditions. The resultant estimators of the loading matrices are treated as the new weight/projection matrices and thus the above update procedure can be iteratively performed until convergence. Theoretically, given the true dimensions of the factor matrices, we derive the convergence rates of the estimators for loading matrices and common components at any $s$-th step iteration. Additionally, we propose an eigenvalue-ratio method to estimate the pair of factor numbers consistently. Thorough numerical simulations are conducted to investigate the finite-sample performance of the proposed methods and two real datasets associated with financial portfolios and multinational macroeconomic indices are used to illustrate our algorithm's practical usefulness.
Finding a solution to the linear system $Ax = b$ with various minimization properties arises from many engineering and computer science applications, including compressed sensing, image processing, and machine learning. In the age of big data, the scalability of stochastic optimization algorithms has made it increasingly important to solve problems of unprecedented sizes. This paper focuses on the problem of minimizing a strongly convex objective function subject to linearly constraints. We consider the dual formulation of this problem and adopt the stochastic coordinate descent to solve it. The proposed algorithmic framework, called fast stochastic dual coordinate descent, utilizes an adaptive variation of Polyak's heavy ball momentum and user-defined distributions for sampling. Our adaptive heavy ball momentum technique can efficiently update the parameters by using iterative information, overcoming the limitation of the heavy ball momentum method where prior knowledge of certain parameters, such as singular values of a matrix, is required. We prove that, under strongly admissible of the objective function, the propose method converges linearly in expectation. By varying the sampling matrix, we recover a comprehensive array of well-known algorithms as special cases, including the randomized sparse Kaczmarz method, the randomized regularized Kaczmarz method, the linearized Bregman iteration, and a variant of the conjugate gradient (CG) method. Numerical experiments are provided to confirm our results.
The paper addresses an optimal ensemble control problem for nonlocal continuity equations on the space of probability measures. We admit the general nonlinear cost functional, and an option to directly control the nonlocal terms of the driving vector field. For this problem, we design a descent method based on Pontryagin's maximum principle (PMP). To this end, we derive a new form of PMP with a decoupled Hamiltonian system. Specifically, we extract the adjoint system of linear nonlocal balance laws on the space of signed measures and prove its well-posedness. As an implementation of the designed descent method, we propose an indirect deterministic numeric algorithm with backtracking. We prove the convergence of the algorithm and illustrate its modus operandi by treating a simple case involving a Kuramoto-type model of a population of interacting oscillators.
Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) that is identical to that of the Gradient Descent Ascent (GDA) method when derived naively. However, the convergence properties of these methods are qualitatively different, even on simple bilinear games. Thus the ODE perspective, which has proved powerful in analyzing single-objective optimization methods, has not played a similar role in saddle-point optimization. We adopt a framework studied in fluid dynamics -- known as High-Resolution Differential Equations (HRDEs) -- to design differential equation models for several saddle-point optimization methods. Critically, these HRDEs are distinct for various saddle-point optimization methods. Moreover, in bilinear games, the convergence properties of the HRDEs match the qualitative features of the corresponding discrete methods. Additionally, we show that the HRDE of Optimistic Gradient Descent Ascent (OGDA) exhibits \emph{last-iterate convergence} for general monotone variational inequalities. Finally, we provide rates of convergence for the \emph{best-iterate convergence} of the OGDA method, relying solely on the first-order smoothness of the monotone operator.
To quantify uncertainties in inverse problems of partial differential equations (PDEs), we formulate them into statistical inference problems using Bayes' formula. Recently, well-justified infinite-dimensional Bayesian analysis methods have been developed to construct dimension-independent algorithms. However, there are three challenges for these infinite-dimensional Bayesian methods: prior measures usually act as regularizers and are not able to incorporate prior information efficiently; complex noises, such as more practical non-i.i.d. distributed noises, are rarely considered; and time-consuming forward PDE solvers are needed to estimate posterior statistical quantities. To address these issues, an infinite-dimensional inference framework has been proposed based on the infinite-dimensional variational inference method and deep generative models. Specifically, by introducing some measure equivalence assumptions, we derive the evidence lower bound in the infinite-dimensional setting and provide possible parametric strategies that yield a general inference framework called the Variational Inverting Network (VINet). This inference framework can encode prior and noise information from learning examples. In addition, relying on the power of deep neural networks, the posterior mean and variance can be efficiently and explicitly generated in the inference stage. In numerical experiments, we design specific network structures that yield a computable VINet from the general inference framework. Numerical examples of linear inverse problems of an elliptic equation and the Helmholtz equation are presented to illustrate the effectiveness of the proposed inference framework.
In this paper, we propose efficient quantum algorithms for solving nonlinear stochastic differential equations (SDE) via the associated Fokker-Planck equation (FPE). We discretize the FPE in space and time using two well-known numerical schemes, namely Chang-Cooper and implicit finite difference. We then compute the solution of the resulting system of linear equations using the quantum linear systems algorithm. We present detailed error and complexity analyses for both these schemes and demonstrate that our proposed algorithms, under certain conditions, provably compute the solution to the FPE within prescribed $\epsilon$ error bounds with polynomial dependence on state dimension $d$. Classical numerical methods scale exponentially with dimension, thus, our approach, under the aforementioned conditions, provides an \emph{exponential speed-up} over traditional approaches.
An NP-complete graph decision problem, the "Multi-stage graph Simple Path" (abbr. MSP) problem, is introduced. The main contribution of this paper is a poly-time algorithm named the ZH algorithm for the problem together with the proof of its correctness, which implies NP=P. (1) A crucial structural property is discovered, whereby all MSP instances are arranged into the sequence $G_{0}$,$G_{1}$,$G_{2}$,... ($G_{k}$ essentially stands for a group of graphs for all $k\geq 0$). For each $G_{j}(j>0)$ in the sequence, there is a graph $G_{i}(0\leq i<j)$ "mathematically homomorphic" to $G_{j}$ which keeps completely accordant with $G_{j}$ on the existence of global solutions. This naturally provides a chance of applying mathematical induction for the proof of an algorithm. In previous attempts, algorithms used for making global decisions were mostly guided by heuristics and intuition. Rather, the ZH algorithm is dedicatedly designed to comply with the proposed proving framework of mathematical induction. (2) Although the ZH algorithm deals with paths, it always regards paths as a collection of edge sets. This is the key to the avoidance of exponential complexity. (3) Any poly-time algorithm that seeks global information can barely avoid the error caused by localized computation. In the ZH algorithm, the proposed reachable-path edge-set $R(e)$ and the computed information for it carry all necessary contextual information, which can be utilized to summarize the "history" and to detect the "future" for searching global solutions. (4) The relation between local strategies and global strategies is discovered and established, wherein preceding decisions can pose constraints to subsequent decisions (and vice versa). This interplay resembles the paradigm of dynamic programming, while much more convoluted. Nevertheless, the computation is always strait forward and decreases monotonically.
When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.