国产欧美日韩视频一区二区,欧美精品日韩精品国内精品,奇米网777奇米影视狠狠,天天射夭夭干欧美性爱,国产成A人亚洲精V品在线观看

Nonsmooth optimization problems arising in practice tend to exhibit beneficial smooth substructure: their domains stratify into "active manifolds" of smooth variation, which common proximal algorithms "identify" in finite time. Identification then entails a transition to smooth dynamics, and accommodates second-order acceleration techniques. While identification is clearly useful algorithmically, empirical evidence suggests that even those algorithms that do not identify the active manifold in finite time -- notably the subgradient method -- are nonetheless affected by it. This work seeks to explain this phenomenon, asking: how do active manifolds impact the subgradient method in nonsmooth optimization? In this work, we answer this question by introducing two algorithmically useful properties -- aiming and subgradient approximation -- that fully expose the smooth substructure of the problem. We show that these properties imply that the shadow of the (stochastic) subgradient method along the active manifold is precisely an inexact Riemannian gradient method with an implicit retraction. We prove that these properties hold for a wide class of problems, including cone reducible/decomposable functions and generic semialgebraic problems. Moreover, we develop a thorough calculus, proving such properties are preserved under smooth deformations and spectral lifts. This viewpoint then leads to several algorithmic consequences that parallel results in smooth optimization, despite the nonsmoothness of the problem: local rates of convergence, asymptotic normality, and saddle point avoidance. The asymptotic normality results appear to be new even in the most classical setting of stochastic nonlinear programming. The results culminate in the following observation: the perturbed subgradient method on generic, Clarke regular semialgebraic problems, converges only to local minimizers.

相關內容

鞍點

關注 0

在數學中，鞍點或極大極小點是函數圖形表面上的一點，其正交方向上的斜率(導數)都為零，但它不是函數的局部極值。鞍點是在某一軸向(峰值之間)有一個相對最小的臨界點，在交叉軸上有一個相對最大的臨界點。

正則化項 · 估計/估計量 · 無偏 · 確切的 · 隨機性策略 ·

2021 年 10 月 19 日

Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization

Yuhao Ding,Junzi Zhang,Javad Lavaei

Entropy regularization is an efficient technique for encouraging exploration and preventing a premature convergence of (vanilla) policy gradient methods in reinforcement learning (RL). However, the theoretical understanding of entropy regularized RL algorithms has been limited. In this paper, we revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization, whose convergence has so far only been established assuming access to exact gradient oracles. To go beyond this scenario, we propose the first set of (nearly) unbiased stochastic policy gradient estimators with trajectory-level entropy regularization, with one being an unbiased visitation measure-based estimator and the other one being a nearly unbiased yet more practical trajectory-based estimator. We prove that although the estimators themselves are unbounded in general due to the additional logarithmic policy rewards introduced by the entropy term, the variances are uniformly bounded. This enables the development of the first set of convergence results for stochastic entropy regularized policy gradient methods to both stationary points and globally optimal policies. We also develop some improved sample complexity results under a good initialization.

PG · 全局優化 · 優化器 · 動量 · 樣本復雜度 ·

2021 年 10 月 19 日

On the Global Convergence of Momentum-based Policy Gradient

Yuhao Ding,Junzi Zhang,Javad Lavaei

Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fisher-non-degenerate policy parametrizations, and show that adding a momentum improves the global optimality sample complexity of vanilla PG methods by $\tilde{\mathcal{O}}(\epsilon^{-1.5})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$, respectively, where $\epsilon>0$ is the target tolerance. Our work is the first one that obtains global convergence results for the momentum-based PG methods. For the generic Fisher-non-degenerate policy parametrizations, our result is the first single-loop and finite-batch PG algorithm achieving $\tilde{O}(\epsilon^{-3})$ global optimality sample complexity. Finally, as a by-product, our methods also provide general framework for analyzing the global convergence rates of stochastic PG methods, which can be easily applied and extended to different PG estimators.

圖 · CASE · 正則化項 · 稀疏 · 數學 ·

2021 年 10 月 19 日

Lower bounds on the chromatic number of random graphs

Peter Ayre,Amin Coja-Oghlan,Catherine Greenhill

We prove that a formula predicted on the basis of non-rigorous physics arguments [Zdeborova and Krzakala: Phys. Rev. E (2007)] provides a lower bound on the chromatic number of sparse random graphs. The proof is based on the interpolation method from mathematical physics. In the case of random regular graphs the lower bound can be expressed algebraically, while in the case of the binomial random we obtain a variational formula. As an application we calculate improved explicit lower bounds on the chromatic number of random graphs for small (average) degrees. Additionally, show how asymptotic formulas for large degrees that were previously obtained by lengthy and complicated combinatorial arguments can be re-derived easily from these new results.

估計/估計量 · 優化器 · 對數似然 · 局部最優 · 目標函數 ·

2021 年 10 月 18 日

A Global Stochastic Optimization Particle Filter Algorithm

Mathieu Gerber,Randal Douc

from arxiv, 61 pages, 4 figures

We introduce a new algorithm for expected log-likelihood maximization in situations where the objective function is multi-modal and/or has saddle points, that we term G-PFSO. The key idea underpinning G-PFSO is to define a sequence of probability distributions which (a) is shown to concentrate on the target parameter value and (b) can be efficiently estimated by means of a standard particle filter algorithm. These distributions depends on a learning rate, where the faster the learning rate is the faster is the rate at which they concentrate on the desired parameter value but the lesser is the ability of G-PFSO to escape from a local optimum of the objective function. To conciliate ability to escape from a local optimum and fast convergence rate, the proposed estimator exploits the acceleration property of averaging, well-known in the stochastic gradient literature. Based on challenging estimation problems, our numerical experiments suggest that the estimator introduced in this paper converges at the optimal rate, and illustrate the practical usefulness of G-PFSO for parameter inference in large datasets. If the focus of this work is expected log-likelihood maximization the proposed approach and its theory apply more generally for optimizing a function defined through an expectation.

估計/估計量 · 正定 · 離散化 · Principle · 優化器 ·

2021 年 10 月 17 日

Error and Stability Estimates of the Least-Squares Variational Kernel-Based Methods for Second Order PDEs

Salar Seyednazari,Mehdi Tatari,Davoud Mirzaei

from arxiv, This paper includes 29 pages, 1 figure and 2 tables

We consider the least-squares variational kernel-based methods for numerical solution of partial differential equations. Indeed, we focus on least-squares principles to develop meshfree methods to find the numerical solution of a general second order ADN elliptic boundary value problem in domain $\Omega \subset \mathbb{R}^d$ under Dirichlet boundary conditions. Most notably, in these principles it is not assumed that differential operator is self-adjoint or positive definite as it would have to be in the Rayleigh-Ritz setting. However, the new scheme leads to a symmetric and positive definite algebraic system allowing us to circumvent the compatibility conditions arising in standard and mixed-Galerkin methods. In particular, the resulting method does not require certain subspaces satisfying any boundary condition. The trial space for discretization is provided via standard kernels that reproduce $H^\tau(\Omega)$, $\tau>d/2$, as their native spaces. Therefore, the smoothness of the approximation functions can be arbitrary increased without any additional task. The solvability of the scheme is proved and the error estimates are derived for functions in appropriate Sobolev spaces. For the weighted discrete least-squares principles, we show that the optimal rate of convergence in $L^2(\Omega)$ is accessible. Furthermore, for $d \le 3$, the proposed method has optimal rate of convergence in $H^k(\Omega)$ whenever $k \le \tau$. The condition number of the final linear system is approximated in terms of discterization quality. Finally, the results of some computational experiments support the theoretical error bounds.

非凸 · 優化器 · 鞍點 · 駐點 · 泛函 ·

2021 年 10 月 15 日

Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

Ziyi Chen,Qunwei Li,Yi Zhou

from arxiv, 23 pages, no figures. arXiv admin note: text overlap with arXiv:2102.04653

The gradient descent-ascent (GDA) algorithm has been widely applied to solve nonconvex minimax optimization problems. However, the existing GDA-type algorithms can only find first-order stationary points of the envelope function of nonconvex minimax optimization problems, which does not rule out the possibility to get stuck at suboptimal saddle points. In this paper, we develop Cubic-GDA -- the first GDA-type algorithm for escaping strict saddle points in nonconvex-strongly-concave minimax optimization. Specifically, the algorithm uses gradient ascent to estimate the second-order information of the minimax objective function, and it leverages the cubic regularization technique to efficiently escape the strict saddle points. Under standard smoothness assumptions on the objective function, we show that Cubic-GDA admits an intrinsic potential function whose value monotonically decreases in the minimax optimization process. Such a property leads to a desired global convergence of Cubic-GDA to a second-order stationary point at a sublinear rate. Moreover, we analyze the convergence rate of Cubic-GDA in the full spectrum of a gradient dominant-type nonconvex geometry. Our result shows that Cubic-GDA achieves an orderwise faster convergence rate than the standard GDA for a wide spectrum of gradient dominant geometry. Our study bridges minimax optimization with second-order optimization and may inspire new developments along this direction.

MCMC · 樣本 · Lipschitz常數 · 馬爾可夫鏈蒙特卡羅 · Oracle ·

2021 年 10 月 9 日

A Proximal Algorithm for Sampling from Non-smooth Potentials

Jiaming Liang,Yongxin Chen

from arxiv, 16 pages

Markov chain Monte Carlo (MCMC) is an effective and dominant method to sample from high-dimensional complex distributions. Yet, most existing MCMC methods are only applicable to settings with smooth potentials (log-densities). In this work, we examine sampling problems with non-smooth potentials. We propose a novel MCMC algorithm for sampling from non-smooth potentials. We provide a non-asymptotical analysis of our algorithm and establish a polynomial-time complexity $\tilde {\cal O}(d\varepsilon^{-1})$ to obtain $\varepsilon$ total variation distance to the target density, better than all existing results under the same assumptions. Our method is based on the proximal bundle method and an alternating sampling framework. This framework requires the so-called restricted Gaussian oracle, which can be viewed as a sampling counterpart of the proximal mapping in convex optimization. One key contribution of this work is a fast algorithm that realizes the restricted Gaussian oracle for any convex non-smooth potential with bounded Lipschitz constant.

目標函數 · 近似 · 優化器 · 泛函 · Performer ·

2021 年 10 月 8 日

Classical symmetries and the Quantum Approximate Optimization Algorithm

Ruslan Shaydulin,Stuart Hadfield,Tad Hogg,Ilya Safro

We study the relationship between the Quantum Approximate Optimization Algorithm (QAOA) and the underlying symmetries of the objective function to be optimized. Our approach formalizes the connection between quantum symmetry properties of the QAOA dynamics and the group of classical symmetries of the objective function. The connection is general and includes but is not limited to problems defined on graphs. We show a series of results exploring the connection and highlight examples of hard problem classes where a nontrivial symmetry subgroup can be obtained efficiently. In particular we show how classical objective function symmetries lead to invariant measurement outcome probabilities across states connected by such symmetries, independent of the choice of algorithm parameters or number of layers. To illustrate the power of the developed connection, we apply machine learning techniques towards predicting QAOA performance based on symmetry considerations. We provide numerical evidence that a small set of graph symmetry properties suffices to predict the minimum QAOA depth required to achieve a target approximation ratio on the MaxCut problem, in a practically important setting where QAOA parameter schedules are constrained to be linear and hence easier to optimize.

隨機梯度下降 · SGD · 噪聲 · MoDELS · 發散 ·

2021 年 10 月 4 日

Global Convergence and Stability of Stochastic Gradient Descent

Vivak Patel,Bowen Tian,Shushu Zhang

In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we make substantial progress on this shortcoming. First, we establish that SGD's iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability.

全局極小值 · 優化器 · 極小值 · 非凸 · 近似 ·

2021 年 3 月 24 日

Why Do Local Methods Solve Nonconvex Problems?

Tengyu Ma

from arxiv, This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of Algorithms"

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.