We present Path Integral Sampler~(PIS), a novel algorithm to draw samples from unnormalized probability density functions. The PIS is built on the Schr\"odinger bridge problem which aims to recover the most likely evolution of a diffusion process given its initial distribution and terminal distribution. The PIS draws samples from the initial distribution and then propagates the samples through the Schr\"odinger bridge to reach the terminal distribution. Applying the Girsanov theorem, with a simple prior diffusion, we formulate the PIS as a stochastic optimal control problem whose running cost is the control energy and terminal cost is chosen according to the target distribution. By modeling the control as a neural network, we establish a sampling algorithm that can be trained end-to-end. We provide theoretical justification of the sampling quality of PIS in terms of Wasserstein distance when sub-optimal control is used. Moreover, the path integrals theory is used to compute importance weights of the samples to compensate for the bias induced by the sub-optimality of the controller and time-discretization. We experimentally demonstrate the advantages of PIS compared with other start-of-the-art sampling methods on a variety of tasks.
This paper deals with the control of parasitism in variational integrators for degenerate Lagrangian systems by writing them as general linear methods. This enables us to calculate their parasitic growth parameters which are responsible for the loss of long-time energy conservation properties of these algorithms. As a remedy and to offset the effects of parasitism, the standard projection technique is then applied to the general linear methods to numerically preserve the invariants of the degenerate Lagrangian systems by projecting the solution onto the desired manifold.
To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience. This is important to learn counterfactuals, or because the experience was generated out of its own control. However, off-policy learning is non-trivial, and standard reinforcement-learning algorithms can be unstable and divergent. In this paper we discuss a novel family of off-policy prediction algorithms which are convergent by construction. The idea is to first learn on-policy about the data-generating behaviour, and then bootstrap an off-policy value estimate on this on-policy estimate, thereby constructing a value estimate that is partially off-policy. This process can be repeated to build a chain of value functions, each time bootstrapping a new estimate on the previous estimate in the chain. Each step in the chain is stable and hence the complete algorithm is guaranteed to be stable. Under mild conditions this comes arbitrarily close to the off-policy TD solution when we increase the length of the chain. Hence it can compute the solution even in cases where off-policy TD diverges. We prove that the proposed scheme is convergent and corresponds to an iterative decomposition of the inverse key matrix. Furthermore it can be interpreted as estimating a novel objective -- that we call a `k-step expedition' -- of following the target policy for finitely many steps before continuing indefinitely with the behaviour policy. Empirically we evaluate the idea on challenging MDPs such as Baird's counter example and observe favourable results.
We introduce and solve the non-commutative version of the Hermite-Pad\'{e} type I approximation problem. Its solution, expressed by quasideterminants, leads in a natural way to a subclass of solutions of the non-commutative Hirota (discrete Kadomtsev--Petviashvili) system and of its linear problem. We also prove integrability of the constrained system, which in the simplest case is the non-commutative discrete-time Toda lattice equation known from the theory of non-commutative Pad\'{e} approximants and matrix orthogonal polynomials.
We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distributed algorithm. The reduction guarantees that the resulting distributed algorithm has low regret relative to the optimal precomputed joint policy. Our methodology involves generalizing online convex optimization to a multi-agent setting and applying recent tools from nonstochastic control derived for a single agent. We empirically evaluate our method on a model of an overactuated aircraft. We show that the distributed method is robust to failure and to adversarial perturbations in the dynamics.
Many problems in computational science and engineering can be described in terms of approximating a smooth function of $d$ variables, defined over an unknown domain of interest $\Omega\subset \mathbb{R}^d$, from sample data. Here both the curse of dimensionality ($d\gg 1$) and the lack of domain knowledge with $\Omega$ potentially irregular and/or disconnected are confounding factors for sampling-based methods. Na\"{i}ve approaches often lead to wasted samples and inefficient approximation schemes. For example, uniform sampling can result in upwards of 20\% wasted samples in some problems. In surrogate model construction in computational uncertainty quantification (UQ), the high cost of computing samples needs a more efficient sampling procedure. In the last years, methods for computing such approximations from sample data have been studied in the case of irregular domains. The advantages of computing sampling measures depending on an approximation space $P$ of $\dim(P)=N$ have been shown. In particular, such methods confer advantages such as stability and well-conditioning, with $\mathcal{O}(N\log(N))$ as sample complexity. The recently-proposed adaptive sampling for general domains (ASGD) strategy is one method to construct these sampling measures. The main contribution of this paper is to improve ASGD by adaptively updating the sampling measures over unknown domains. We achieve this by first introducing a general domain adaptivity strategy (GDAS), which approximates the function and domain of interest from sample points. Second, we propose adaptive sampling for unknown domains (ASUD), which generates sampling measures over a domain that may not be known in advance. Then, we derive least squares techniques for polynomial approximation on unknown domains. Numerical results show that the ASUD approach can reduce the computational cost by as 50\% when compared with uniform sampling.
Much recent interest has focused on the design of optimization algorithms from the discretization of an associated optimization flow, i.e., a system of differential equations (ODEs) whose trajectories solve an associated optimization problem. Such a design approach poses an important problem: how to find a principled methodology to design and discretize appropriate ODEs. This paper aims to provide a solution to this problem through the use of contraction theory. We first introduce general mathematical results that explain how contraction theory guarantees the stability of the implicit and explicit Euler integration methods. Then, we propose a novel system of ODEs, namely the Accelerated-Contracting-Nesterov flow, and use contraction theory to establish it is an optimization flow with exponential convergence rate, from which the linear convergence rate of its associated optimization algorithm is immediately established. Remarkably, a simple explicit Euler discretization of this flow corresponds to the Nesterov acceleration method. Finally, we present how our approach leads to performance guarantees in the design of optimization algorithms for time-varying optimization problems.
High-dimensional signal recovery of standard linear regression is a key challenge in many engineering fields, such as, communications, compressed sensing, and image processing. The approximate message passing (AMP) algorithm proposed by Donoho \textit{et al} is a computational efficient method to such problems, which can attain Bayes-optimal performance in independent identical distributed (IID) sub-Gaussian random matrices region. A significant feature of AMP is that the dynamical behavior of AMP can be fully predicted by a scalar equation termed station evolution (SE). Although AMP is optimal in IID sub-Gaussian random matrices, AMP may fail to converge when measurement matrix is beyond IID sub-Gaussian. To extend the region of random measurement matrix, an expectation propagation (EP)-related algorithm orthogonal AMP (OAMP) was proposed, which shares the same algorithm with EP, expectation consistent (EC), and vector AMP (VAMP). This paper aims at giving a review for those algorithms. We begin with the worst case, i.e. least absolute shrinkage and selection operator (LASSO) inference problem, and then give the detailed derivation of AMP derived from message passing. Also, in the Bayes-optimal setting, we give the Bayes-optimal AMP which has a slight difference from AMP for LASSO. In addition, we review some AMP-related algorithms: OAMP, VAMP, and Memory AMP (MAMP), which can be applied to more general random matrices.
We introduce vector optimization problems with stochastic bandit feedback, which extends the best arm identification problem to vector-valued rewards. We consider $K$ designs, with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone $C$. This generalizes the concept of Pareto set in multi-objective optimization and allows different sets of preferences of decision-makers to be encoded by $C$. Different than prior work, we define approximations of the Pareto set based on direction-free covering and gap notions. We study the setting where an evaluation of each design yields a noisy observation of the mean reward vector. Under subgaussian noise assumption, we investigate the sample complexity of the na\"ive elimination algorithm in an ($\epsilon,\delta$)-PAC setting, where the goal is to identify an ($\epsilon,\delta$)-PAC Pareto set with the minimum number of design evaluations. In order to characterize the difficulty of learning the Pareto set, we introduce the concept of ordering complexity, i.e., geometric conditions on the deviations of empirical reward vectors from their mean under which the Pareto front can be approximated accurately. We show how to compute the ordering complexity of any polyhedral ordering cone. We run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, returned ($\epsilon,\delta$)-PAC Pareto set and the success of identification.
We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This enables us to compute an automatic adaptive learning rate, thereby providing an accurate solution. In addition, our bundle includes linear approximations computed at the current iterate and other linear estimates of the DNN parameters. The use of these additional approximations makes our method significantly more robust to its hyperparameters. Based on its desirable empirical properties, we term our method Bundle Optimisation for Robust and Accurate Training (BORAT). In order to operationalise BORAT, we design a novel algorithm for optimising the bundle approximation efficiently at each iteration. We establish the theoretical convergence of BORAT in both convex and non-convex settings. Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust.
We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions used in practice, including ReLU. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network. First, we establish the global convergence of continuous solutions of the differential inclusion being a nonsmooth analogue of the gradient flow for the MSE loss. Second, we provide a technical result (working also for general approximators) relating solutions of the aforementioned differential inclusion to the (discrete) stochastic gradient descent sequences, hence establishing linear convergence towards zero loss for the stochastic gradient descent iterations.