99视频在线播放喷射,国模私拍视频一区二区,国产黄色AV免费在线看

We establish generalization error bounds for stochastic gradient Langevin dynamics (SGLD) with constant learning rate under the assumptions of dissipativity and smoothness, a setting that has received increased attention in the sampling/optimization literature. Unlike existing bounds for SGLD in non-convex settings, ours are time-independent and decay to zero as the sample size increases. Using the framework of uniform stability, we establish time-independent bounds by exploiting the Wasserstein contraction property of the Langevin diffusion, which also allows us to circumvent the need to bound gradients using Lipschitz-like assumptions. Our analysis also supports variants of SGLD that use different discretization methods, incorporate Euclidean projections, or use non-isotropic noise.

相關內容

泛化理論

關注 25

核化 · 均值 · 最大平均偏差 · 近似 · 可約的 ·

2022 年 1 月 31 日

Nystr?m Kernel Mean Embeddings

Antoine Chatalic,Nicolas Schreuder,Alessandro Rudi,Lorenzo Rosasco

from arxiv, 8 pages

Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale settings. We propose an efficient approximation procedure based on the Nystr\"om method, which exploits a small random subset of the dataset. Our main result is an upper bound on the approximation error of this procedure. It yields sufficient conditions on the subsample size to obtain the standard $n^{-1/2}$ rate while reducing computational costs. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules, and illustrate our theoretical findings with numerical experiments.

離散化 · 相互獨立的 · 查全率/召回率 · Integration · SimPLe ·

2022 年 1 月 31 日

A high-order velocity-based discontinuous Galerkin scheme for the shallow water equations: local conservation, entropy stability, well-balanced property, and positivity preservation

Guosheng Fu

from arxiv, 25 pages, 8 figures

We present a novel class of locally conservative, entropy stable and well-balanced discontinuous Galerkin (DG) methods for the nonlinear shallow water equation with a non-flat bottom topography. The major novelty of our work is the use of velocity field as an independent solution unknown in the DG scheme, which is closely related to the entropy variable approach to entropy stable schemes for system of conservation laws proposed by Tadmor [22] back in 1986, where recall that velocity is part of the entropy variable for the shallow water equations. Due to the use of velocity as an independent solution unknown, no specific numerical quadrature rules are needed to achieve entropy stability of our scheme on general unstructured meshes in two dimensions. The proposed DG semi-discretization is then carefully combined with the classical explicit strong stability preserving Runge-Kutta (SSP-RK) time integrators [13] to yield a locally conservative, well-balanced, and positivity preserving fully discrete scheme. Here the positivity preservation property is enforced with the help of a simple scaling limiter. In the fully discrete scheme, we re-introduce discharge as an auxiliary unknown variable. In doing so, standard slope limiting procedures can be applied on the conservative variables (water height and discharge) without violating the local conservation property. Here we apply a characteristic-wise TVB limiter [5] on the conservative variables using the Fu-Shu troubled cell indicator [10] in each inner stage of the Runge-Kutta time stepping to suppress numerical oscillations.

可約的 · 方差減小 · 方差 · VR · 平滑 ·

2022 年 1 月 28 日

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

Zijian Liu,Ta Duy Nguyen,Alina Ene,Huy L. Nguyen

In this paper, we study the finite-sum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta}{\epsilon}}\right)$ gradient evaluations and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta\log\beta}{\epsilon}}\right)$ gradient evaluations to attain an $\mathcal{O}(\epsilon)$-suboptimal solution, where $n$ is the number of functions in the finite sum and $\beta$ is the smoothness parameter. This result matches the best-known convergence rate of non-adaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on real-world datasets.

泛化理論 · 優化器 · 泛化誤差 · 估計/估計量 · CASE ·

2022 年 1 月 28 日

Stochastic Chaining and Strengthened Information-Theoretic Generalization Bounds

Ruida Zhou,Chao Tian,Tie Liu

from arxiv, 18 pages, 1 figure

We propose a new approach to apply the chaining technique in conjunction with information-theoretic measures to bound the generalization error of machine learning algorithms. Different from the deterministic chaining approach based on hierarchical partitions of a metric space, previously proposed by Asadi et al., we propose a stochastic chaining approach, which replaces the hierarchical partitions with an abstracted Markovian model borrowed from successive refinement source coding. This approach has three benefits over deterministic chaining: 1) the metric space is not necessarily bounded, 2) facilitation of subsequent analysis to yield more explicit bound, and 3) further opportunity to optimize the bound by removing the geometric rigidity of the partitions. The proposed approach includes the traditional chaining as a special case, and can therefore also utilize any deterministic chaining construction. We illustrate these benefits using the problem of estimating Gaussian mean and that of phase retrieval. For the former, we derive a bound that provides an order-wise improvement over previous results, and for the latter we provide a stochastic chain that allows optimization over the chaining parameter.

近似貝葉斯計算 · 統計量 · 近似 · 環 · 可約的 ·

2022 年 1 月 28 日

Approximate Bayesian Computation with Domain Expert in the Loop

Ayush Bharti,Louis Filstroff,Samuel Kaski

Approximate Bayesian computation (ABC) is a popular likelihood-free inference method for models with intractable likelihood functions. As ABC methods usually rely on comparing summary statistics of observed and simulated data, the choice of the statistics is crucial. This choice involves a trade-off between loss of information and dimensionality reduction, and is often determined based on domain knowledge. However, handcrafting and selecting suitable statistics is a laborious task involving multiple trial-and-error steps. In this work, we introduce an active learning method for ABC statistics selection which reduces the domain expert's work considerably. By involving the experts, we are able to handle misspecified models, unlike the existing dimension reduction methods. Moreover, empirical results show better posterior estimates than with existing methods, when the simulation budget is limited.

隨機梯度下降 · Neural Networks · 隱藏層 · Networking · 層 ·

2022 年 1 月 28 日

Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks

Bart?omiej Polaczyk,Jacek Cyranka

We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions used in practice, including ReLU. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network. First, we establish the global convergence of continuous solutions of the differential inclusion being a nonsmooth analogue of the gradient flow for the MSE loss. Second, we provide a technical result (working also for general approximators) relating solutions of the aforementioned differential inclusion to the (discrete) stochastic gradient descent sequences, hence establishing linear convergence towards zero loss for the stochastic gradient descent iterations.

近似 · 泛函 · 近似誤差 · 對數似然 · 線性組合 ·

2022 年 1 月 28 日

Certified dimension reduction in nonlinear Bayesian inverse problems

Olivier Zahm,Tiangang Cui,Kody Law,Alessio Spantini,Youssef Marzouk

We propose a dimension reduction technique for Bayesian inverse problems with nonlinear forward operators, non-Gaussian priors, and non-Gaussian observation noise. The likelihood function is approximated by a ridge function, i.e., a map which depends non-trivially only on a few linear combinations of the parameters. We build this ridge approximation by minimizing an upper bound on the Kullback--Leibler divergence between the posterior distribution and its approximation. This bound, obtained via logarithmic Sobolev inequalities, allows one to certify the error of the posterior approximation. Computing the bound requires computing the second moment matrix of the gradient of the log-likelihood function. In practice, a sample-based approximation of the upper bound is then required. We provide an analysis that enables control of the posterior approximation error due to this sampling. Numerical and theoretical comparisons with existing methods illustrate the benefits of the proposed methodology.

非凸 · 約束優化 · 泛函 · 優化器 · 近似 ·

2022 年 1 月 27 日

Stochastic First-order Methods for Convex and Nonconvex Functional Constrained Optimization

Digvijay Boob,Qi Deng,Guanghui Lan

from arxiv, 36 pages, final version, accepted at Math Programming

Functional constrained optimization is becoming more and more important in machine learning and operations research. Such problems have potential applications in risk-averse machine learning, semisupervised learning, and robust optimization among others. In this paper, we first present a novel Constraint Extrapolation (ConEx) method for solving convex functional constrained problems, which utilizes linear approximations of the constraint functions to define the extrapolation (or acceleration) step. We show that this method is a unified algorithm that achieves the best-known rate of convergence for solving different functional constrained convex composite problems, including convex or strongly convex, and smooth or nonsmooth problems with a stochastic objective and/or stochastic constraints. Many of these rates of convergence were in fact obtained for the first time in the literature. In addition, ConEx is a single-loop algorithm that does not involve any penalty subproblems. Contrary to existing primal-dual methods, it does not require the projection of Lagrangian multipliers into a (possibly unknown) bounded set. Second, for nonconvex functional constrained problems, we introduce a new proximal point method that transforms the initial nonconvex problem into a sequence of convex problems by adding quadratic terms to both the objective and constraints. Under a certain MFCQ-type assumption, we establish the convergence and rate of convergence of this method to KKT points when the convex subproblems are solved exactly or inexactly. For large-scale and stochastic problems, we present a more practical proximal point method in which the approximate solutions of the subproblems are computed by the aforementioned ConEx method. To the best of our knowledge, most of these convergence and complexity results of the proximal point method for nonconvex problems also seem to be new in the literature.

UniFormer · 流形 · 近似 · 流形學習 · Performance ·

2018 年 12 月 6 日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes,John Healy,James Melville

from arxiv, Reference implementation available at //github.com/lmcinnes/umap

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.

優化器 · 強化學習 · 學成 · state-of-the-art · SimPLe ·

2018 年 7 月 25 日

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.