动漫AV观看网站不卡无码,日韩一区二区三区免费在线观看,特黄视频在线观看免费的,国产免费一区二区三区观看,欧美日韩久久99精品一区

from arxiv, 51 pages, 8 figures. Revised version: an improved introduction, a completely new numerical section including experiments in non-convex settings, a new appendix discussing the dependence of the variance of SGLD on the mini-batch size

Stochastic Gradient Algorithms (SGAs) are ubiquitous in computational statistics, machine learning and optimisation. Recent years have brought an influx of interest in SGAs, and the non-asymptotic analysis of their bias is by now well-developed. However, relatively little is known about the optimal choice of the random approximation (e.g mini-batching) of the gradient in SGAs as this relies on the analysis of the variance and is problem specific. While there have been numerous attempts to reduce the variance of SGAs, these typically exploit a particular structure of the sampled distribution by requiring a priori knowledge of its density's mode. It is thus unclear how to adapt such algorithms to non-log-concave settings. In this paper, we construct a Multi-index Antithetic Stochastic Gradient Algorithm (MASGA) whose implementation is independent of the structure of the target measure and which achieves performance on par with Monte Carlo estimators that have access to unbiased samples from the distribution of interest. In other words, MASGA is an optimal estimator from the mean square error-computational cost perspective within the class of Monte Carlo estimators. We prove this fact rigorously for log-concave settings and verify it numerically for some examples where the log-concavity assumption is not satisfied.

相關內容

估計/估計量

關注 3

線性的 · 約束優化 · PAC學習理論 · SimPLe · CASE ·

2021 年 11 月 23 日

Adaptive Multi-Goal Exploration

Jean Tarbouriech,Omar Darwiche Domingues,Pierre Ménard,Matteo Pirotta,Michal Valko,Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that is based on a simple constrained optimization problem, which adaptively targets goal states that are neither too difficult nor too easy to reach according to the agent's current knowledge. We show how AdaGoal can be used to tackle the objective of learning an $\epsilon$-optimal goal-conditioned policy for all the goal states that are reachable within $L$ steps in expectation from a reference state $s_0$ in a reward-free Markov decision process. In the tabular case with $S$ states and $A$ actions, our algorithm requires $\tilde{O}(L^3 S A \epsilon^{-2})$ exploration steps, which is nearly minimax optimal. We also readily instantiate AdaGoal in linear mixture Markov decision processes, which yields the first goal-oriented PAC guarantee with linear function approximation. Beyond its strong theoretical guarantees, AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.

Processing（編程語言） · 高斯過程回歸 · 核化 · PDE · Performer ·

2021 年 11 月 23 日

Stochastic Processes Under Linear Differential Constraints : Application to Gaussian Process Regression for the 3 Dimensional Free Space Wave Equation

Iain Henderson,Pascal Noble,Olivier Roustant

Let $P$ be a linear differential operator over $\mathcal{D} \subset \mathbb{R}^d$ and $U = (U_x)_{x \in \mathcal{D}}$ a second order stochastic process. In the first part of this article, we prove a new simple necessary and sufficient condition for all the trajectories of $U$ to verify the partial differential equation (PDE) $T(U) = 0$. This condition is formulated in terms of the covariance kernel of $U$. The novelty of this result is that the equality $T(U) = 0$ is understood in the sense of distributions, which is a functional analysis framework particularly adapted to the study of PDEs. This theorem provides precious insights during the second part of this article, which is dedicated to performing "physically informed" machine learning on data that is solution to the homogeneous 3 dimensional free space wave equation. We perform Gaussian Process Regression (GPR) on this data, which is a kernel based machine learning technique. To do so, we model the solution of this PDE as a trajectory drawn from a well-chosen Gaussian process (GP). We obtain explicit formulas for the covariance kernel of the corresponding stochastic process; this kernel can then be used for GPR. We explore two particular cases : the radial symmetry and the point source. In the case of radial symmetry, we derive "fast to compute" GPR formulas; in the case of the point source, we show a direct link between GPR and the classical triangulation method for point source localization used e.g. in GPS systems. We also show that this use of GPR can be interpreted as a new answer to the ill-posed inverse problem of reconstructing initial conditions for the wave equation with finite dimensional data, and also provides a way of estimating physical parameters from this data as in [Raissi et al,2017]. We finish by showcasing this physically informed GPR on a number of practical examples.

樣本復雜度 · 優化器 · 泛函 · 樣本 · Oracle ·

2021 年 11 月 23 日

Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

Krishnakumar Balasubramanian,Saeed Ghadimi,Anthony Nguyen

In this paper, we study smooth stochastic multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze their convergence to an $\epsilon$-stationary point of the problem. We show that the first algorithm, which is a generalization of \cite{GhaRuswan20} to the $T$ level case, can achieve a sample complexity of $\mathcal{O}(1/\epsilon^6)$ by using mini-batches of samples in each iteration. By modifying this algorithm using linearized stochastic estimates of the function values, we improve the sample complexity to $\mathcal{O}(1/\epsilon^4)$. {\color{black}This modification not only removes the requirement of having a mini-batch of samples in each iteration, but also makes the algorithm parameter-free and easy to implement}. To the best of our knowledge, this is the first time that such an online algorithm designed for the (un)constrained multi-level setting, obtains the same sample complexity of the smooth single-level setting, under standard assumptions (unbiasedness and boundedness of the second moments) on the stochastic first-order oracle.

正則化項 · Tikhonov正則化 · 優化器 · 學成 · CASE ·

2021 年 11 月 22 日

Learning the optimal Tikhonov regularizer for inverse problems

Giovanni S. Alberti,Ernesto De Vito,Matti Lassas,Luca Ratti,Matteo Santacesaria

In this work, we consider the linear inverse problem $y=Ax+\epsilon$, where $A\colon X\to Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$, $x$ is a random variable in $X$ and $\epsilon$ is a zero-mean random process in $Y$. This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography. Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data. Our first result is a characterization of the optimal generalized Tikhonov regularizer, with respect to the mean squared error. We find that it is completely independent of the forward operator $A$ and depends only on the mean and covariance of $x$. Then, we consider the problem of learning the regularizer from a finite training set in two different frameworks: one supervised, based on samples of both $x$ and $y$, and one unsupervised, based only on samples of $x$. In both cases, we prove generalization bounds, under some weak assumptions on the distribution of $x$ and $\epsilon$, including the case of sub-Gaussian variables. Our bounds hold in infinite-dimensional spaces, thereby showing that finer and finer discretizations do not make this learning problem harder. The results are validated through numerical simulations.

可約的 · 方差 · 方差減小 · 近似 · 估計/估計量 ·

2021 年 11 月 21 日

Stochastic viscosity approximations of Hamilton-Jacobi equations and variance reduction

Grégoire Ferré

We consider the computation of free energy-like quantities for diffusions in high dimension, when resorting to Monte Carlo simulation is necessary. Such stochastic computations typically suffer from high variance, in particular in a low noise regime, because the expectation is dominated by rare trajectories for which the observable reaches large values. Although importance sampling, or tilting of trajectories, is now a standard technique for reducing the variance of such estimators, quantitative criteria for proving that a given control reduces variance are scarce, and often do not apply to practical situations. The goal of this work is to provide a quantitative criterion for assessing whether a given bias reduces variance, and at which order. We rely for this on a recently introduced notion of stochastic solution for Hamilton-Jacobi-Bellman equations. Based on this tool, we introduce the notion of k-stochastic viscosity approximation of a HJB equation. We next prove that such approximate solutions are associated with estimators having a relative variance of order k-1 at log-scale. Finally, in order to show that our definition is relevant, we provide examples of stochastic viscosity approximations of order one and two, with a numerical illustration confirming our theoretical findings.

Extensibility · 可約的 · 近似 · AIM · 交叉驗證 ·

2021 年 11 月 21 日

A stochastic extended Rippa's algorithm for LpOCV

Francesco Marchetti,Leevan Ling

In kernel-based approximation, the tuning of the so-called shape parameter is a fundamental step for achieving an accurate reconstruction. Recently, the popular Rippa's algorithm [14] has been extended to a more general cross validation setting. In this work, we propose a modification of such extension with the aim of further reducing the computational costs. The resulting Stochastic Extended Rippa's Algorithm (SERA) is first detailed and then tested by means of various numerical experiments, which show its efficacy and effectiveness in different approximation settings.

隨機梯度下降 · SGD · 小批量 · 泛化理論 · 估計/估計量 ·

2021 年 11 月 19 日

Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

Hao Chen,Lili Zheng,Raed Al Kontar,Garvesh Raskutti

Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on hyperparameter estimation for the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full log-likelihood loss function, and recovers model hyperparameters with rate $O(\frac{1}{K})$ for $K$ iterations, up to a statistical error term depending on the minibatch size. Our theoretical guarantees hold provided that the kernel functions exhibit exponential or polynomial eigendecay which is satisfied by a wide range of kernels commonly used in GPs. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.

泛函 · 情景 · 貪心逐層預訓練 · 可約的 · 極大 ·

2021 年 11 月 19 日

Randomized Algorithms for Monotone Submodular Function Maximization on the Integer Lattice

Alberto Schiabel,Vyacheslav Kungurtsev,Jakub Marecek

Optimization problems with set submodular objective functions have many real-world applications. In discrete scenarios, where the same item can be selected more than once, the domain is generalized from a 2-element set to a bounded integer lattice. In this work, we consider the problem of maximizing a monotone submodular function on the bounded integer lattice subject to a cardinality constraint. In particular, we focus on maximizing DR-submodular functions, i.e., functions defined on the integer lattice that exhibit the diminishing returns property. Given any epsilon > 0, we present a randomized algorithm with probabilistic guarantees of O(1 - 1/e - epsilon) approximation, using a framework inspired by a Stochastic Greedy algorithm developed for set submodular functions by Mirzasoleiman et al. We then show that, on synthetic DR-submodular functions, applying our proposed algorithm on the integer lattice is faster than the alternatives, including reducing a target problem to the set domain and then applying the fastest known set submodular maximization algorithm.

過濾式方法 · 馬爾可夫鏈蒙特卡羅 · 估計/估計量 · 蒙特卡羅 · 狀態空間 ·

2021 年 11 月 18 日

The Application of Zig-Zag Sampler in Sequential Markov Chain Monte Carlo

Yu Han,Kazuyuki Nakamura

Particle filtering methods are widely applied in sequential state estimation within nonlinear non-Gaussian state space model. However, the traditional particle filtering methods suffer the weight degeneracy in the high-dimensional state space model. Currently, there are many methods to improve the performance of particle filtering in high-dimensional state space model. Among these, the more advanced method is to construct the Sequential Makov chian Monte Carlo (SMCMC) framework by implementing the Composite Metropolis-Hasting (MH) Kernel. In this paper, we proposed to discrete the Zig-Zag Sampler and apply the Zig-Zag Sampler in the refinement stage of the Composite MH Kernel within the SMCMC framework which is implemented the invertible particle flow in the joint draw stage. We evaluate the performance of proposed method through numerical experiments of the challenging complex high-dimensional filtering examples. Nemurical experiments show that in high-dimensional state estimation examples, the proposed method improves estimation accuracy and increases the acceptance ratio compared with state-of-the-art filtering methods.

單純形 · Performer · Processing（編程語言） · 貝葉斯推斷 · 離散化 ·

2018 年 6 月 19 日

Large-Scale Stochastic Sampling from the Probability Simplex

Jack Baker,Paul Fearnhead,Emily B Fox,Christopher Nemeth

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.