欧美综合一本热第九页,午夜男女爽爽爽免费大片,国产精品一区二区亚洲人成毛片,一区二区在线中国

Lipschitz連續 · Continuity · Lipschitz · Performer · Better ·

2021 年 10 月 12 日

Convergence and stability of the semi-tamed Milstein method for commutative stochastic differential equations with non-globally Lipschitz continuous coefficients

Yulong Liu,Yuanling Niu,Xiujun Cheng

A new explicit stochastic scheme of order 1 is proposed for solving commutative stochastic differential equations (SDEs) with non-globally Lipschitz continuous coefficients. The proposed method is a semi-tamed version of Milstein scheme to solve SDEs with the drift coefficient consisting of non-Lipschitz continuous term and globally Lipschitz continuous term. It is easily implementable and achieves higher strong convergence order. A stability criterion for this method is derived, which shows that the stability condition of the numerical methods and that of the solved equations keep uniform. Compared with some widely used numerical schemes, the proposed method has better performance in inheriting the mean square stability of the exact solution of SDEs. Numerical experiments are given to illustrate the obtained convergence and stability properties.

相關內容

Lipschitz連續

關注 0

離散化 · Integration · 時間步 · 近似 · 原點 ·

2021 年 12 月 3 日

A discontinuous Galerkin time integration scheme for second order differential equations with applications to seismic wave propagation problems

Paola F. Antonietti,Ilario Mazzieri,Francesco Migliorini

In this work, we present a new high order Discontinuous Galerkin time integration scheme for second-order (in time) differential systems that typically arise from the space discretization of the elastodynamics equation. By rewriting the original equation as a system of first order differential equations we introduce the method and show that the resulting discrete formulation is well-posed, stable and retains super-optimal rate of convergence with respect to the discretization parameters, namely the time step and the polynomial approximation degree. A set of two- and three-dimensional numerical experiments confirm the theoretical bounds. Finally, the method is applied to real geophysical applications.

Extensibility · 樣本 · 優化器 · 樣本復雜度 · 駐點 ·

2021 年 12 月 3 日

Policy Optimization with Stochastic Mirror Descent

Long Yang,Yu Zhang,Gang Zheng,Qian Zheng,Pengfei Li,Jun Wen,Gang Pan

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed $\mathtt{VRMPO}$ needs only $\mathcal{O}(\epsilon^{-3})$ sample trajectories to achieve an $\epsilon$-approximate first-order stationary point, which matches the best sample complexity for policy optimization. The extensive experimental results demonstrate that $\mathtt{VRMPO}$ outperforms the state-of-the-art policy gradient methods in various settings.

動量 · tuning · 優化器 · INTERACT · 注意力機制 ·

2021 年 12 月 3 日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Junhyung Lyle Kim,Panos Toulis,Anastasios Kyrillidis

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence rate compared to stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.

FAST · 泛函 · 駐點 · Better · 正則化項 ·

2021 年 12 月 3 日

Fast Objective & Duality Gap Convergence for Nonconvex-Strongly-Concave Min-Max Problems

Zhishuai Guo,Yan Yan,Zhuoning Yuan,Tianbao Yang

This paper focuses on stochastic methods for solving smooth non-convex strongly-concave min-max problems, which have received increasing attention due to their potential applications in deep learning (e.g., deep AUC maximization, distributionally robust optimization). However, most of the existing algorithms are slow in practice, and their analysis revolves around the convergence to a nearly stationary point. We consider leveraging the Polyak-\L ojasiewicz (PL) condition to design faster stochastic algorithms with stronger convergence guarantee. Although PL condition has been utilized for designing many stochastic minimization algorithms, their applications for non-convex min-max optimization remain rare. In this paper, we propose and analyze a generic framework of proximal epoch-based method with many well-known stochastic updates embeddable. Fast convergence is established in terms of both {\bf the primal objective gap and the duality gap}. Compared with existing studies, (i) our analysis is based on a novel Lyapunov function consisting of the primal objective gap and the duality gap of a regularized function, and (ii) the results are more comprehensive with improved rates that have better dependence on the condition number under different assumptions. We also conduct deep and non-deep learning experiments to verify the effectiveness of our methods.

方差 · PG · 可約的 · Performer · 估計/估計量 ·

2021 年 8 月 20 日

Settling the Variance of Multi-Agent Policy Gradients

Jakub Grudzien Kuba,Muning Wen,Yaodong Yang,Linghui Meng,Shangding Gu,Haifeng Zhang,David Henry Mguni,Jun Wang

Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin.

優化器 · 可約的 · 近似 · 控制器 · Principle ·

2020 年 6 月 29 日

Differential Dynamic Programming Neural Optimizer

Guan-Horng Liu,Tianrong Chen,Evangelos A. Theodorou

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

Lipschitz · 判別函數 · GANs · 判別器 · WGAN ·

2019 年 2 月 15 日

Lipschitz Generative Adversarial Nets

Zhiming Zhou,Jiadong Liang,Yuxuan Song,Lantao Yu,Hongwei Wang,Weinan Zhang,Yong Yu,Zhihua Zhang

from arxiv, Under review by the International Conference on Machine Learning (ICML 2019)

In this paper we study the convergence of generative adversarial networks (GANs) from the perspective of the informativeness of the gradient of the optimal discriminative function. We show that GANs without restriction on the discriminative function space commonly suffer from the problem that the gradient produced by the discriminator is uninformative to guide the generator. By contrast, Wasserstein GAN (WGAN), where the discriminative function is restricted to $1$-Lipschitz, does not suffer from such a gradient uninformativeness problem. We further show in the paper that the model with a compact dual form of Wasserstein distance, where the Lipschitz condition is relaxed, also suffers from this issue. This implies the importance of Lipschitz condition and motivates us to study the general formulation of GANs with Lipschitz constraint, which leads to a new family of GANs that we call Lipschitz GANs (LGANs). We show that LGANs guarantee the existence and uniqueness of the optimal discriminative function as well as the existence of a unique Nash equilibrium. We prove that LGANs are generally capable of eliminating the gradient uninformativeness problem. According to our empirical analysis, LGANs are more stable and generate consistently higher quality samples compared with WGAN.

Networking · MoDELS · Neural Networks · 潛變量/隱變量 · Continuity ·

2018 年 10 月 3 日

Neural Ordinary Differential Equations

Ricky T. Q. Chen,Yulia Rubanova,Jesse Bettencourt,David Duvenaud

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

優化器 · 強化學習 · 學成 · state-of-the-art · SimPLe ·

2018 年 7 月 25 日

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.