亚洲精品无码国产爽快A片百度-免费A级毛片无码A中文字幕

In this paper we study anisotropic consensus-based optimization (CBO), a multi-agent metaheuristic derivative-free optimization method capable of globally minimizing nonconvex and nonsmooth functions in high dimensions. CBO is based on stochastic swarm intelligence, and inspired by consensus dynamics and opinion formation. Compared to other metaheuristic algorithms like particle swarm optimization, CBO is of a simpler nature and therefore more amenable to theoretical analysis. By adapting a recently established proof technique, we show that anisotropic CBO converges globally with a dimension-independent rate for a rich class of objective functions under minimal assumptions on the initialization of the method. Moreover, the proof technique reveals that CBO performs a convexification of the optimization problem as the number of agents goes to infinity, thus providing an insight into the internal CBO mechanisms responsible for the success of the method. To motivate anisotropic CBO from a practical perspective, we further test the method on a complicated high-dimensional benchmark problem, which is well understood in the machine learning literature.

相關內容

優化器

關注 4

均值 · 學成 · 回合 · Continuity · 縮放 ·

2022 年 1 月 20 日

Decentralized Mean Field Games

Sriram Ganapathi Subramanian,Matthew E. Taylor,Mark Crowley,Pascal Poupart

from arxiv, This work is to appear in AAAI-22

Multiagent reinforcement learning algorithms have not been widely adopted in large scale environments with many agents as they often scale poorly with the number of agents. Using mean field theory to aggregate agents has been proposed as a solution to this problem. However, almost all previous methods in this area make a strong assumption of a centralized system where all the agents in the environment learn the same policy and are effectively indistinguishable from each other. In this paper, we relax this assumption about indistinguishable agents and propose a new mean field system known as Decentralized Mean Field Games, where each agent can be quite different from others. All agents learn independent policies in a decentralized fashion, based on their local observations. We define a theoretical solution concept for this system and provide a fixed point guarantee for a Q-learning based algorithm in this system. A practical consequence of our approach is that we can address a `chicken-and-egg' problem in empirical mean field reinforcement learning algorithms. Further, we provide Q-learning and actor-critic algorithms that use the decentralized mean field learning approach and give stronger performances compared to common baselines in this area. In our setting, agents do not need to be clones of each other and learn in a fully decentralized fashion. Hence, for the first time, we show the application of mean field learning methods in fully competitive environments, large-scale continuous action space environments, and other environments with heterogeneous agents. Importantly, we also apply the mean field method in a ride-sharing problem using a real-world dataset. We propose a decentralized solution to this problem, which is more practical than existing centralized training methods.

優化地形 · 泛化理論 · 優化器 · SGD · Performer ·

2022 年 1 月 20 日

Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape

Devansh Bisla,Jing Wang,Anna Choromanska

In this paper, we study the sharpness of a deep learning (DL) loss landscape around local minima in order to reveal systematic mechanisms underlying the generalization abilities of DL models. Our analysis is performed across varying network and optimizer hyper-parameters, and involves a rich family of different sharpness measures. We compare these measures and show that the low-pass filter-based measure exhibits the highest correlation with the generalization abilities of DL models, has high robustness to both data and label noise, and furthermore can track the double descent behavior for neural networks. We next derive the optimization algorithm, relying on the low-pass filter (LPF), that actively searches the flat regions in the DL optimization landscape using SGD-like procedure. The update of the proposed algorithm, that we call LPF-SGD, is determined by the gradient of the convolution of the filter kernel with the loss function and can be efficiently computed using MC sampling. We empirically show that our algorithm achieves superior generalization performance compared to the common DL training strategies. On the theoretical front, we prove that LPF-SGD converges to a better optimal point with smaller generalization error than SGD.

價值函數 · 擬凹 · Lipschitz常數 · 樣本復雜度 · Lipschitz ·

2022 年 1 月 19 日

On the Convergence Rates of Policy Gradient Methods

Lin Xiao

We consider infinite-horizon discounted Markov decision problems with finite state and action spaces. We show that with direct parametrization in the policy space, the weighted value function, although non-convex in general, is both quasi-convex and quasi-concave. While quasi-convexity helps explain the convergence of policy gradient methods to global optima, quasi-concavity hints at their convergence guarantees using arbitrarily large step sizes that are not dictated by the Lipschitz constant charactering smoothness of the value function. In particular, we show that when using geometrically increasing step sizes, a general class of policy mirror descent methods, including the natural policy gradient method and a projected Q-descent method, all enjoy a linear rate of convergence without relying on entropy or other strongly convex regularization. In addition, we develop a theory of weak gradient-mapping dominance and use it to prove sharper sublinear convergence rate of the projected policy gradient method. Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model.

正則化項 · Neural Networks · 近似 · Networking · Continuity ·

2022 年 1 月 18 日

Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime

Bekzhan Kerimkulov,James-Michael Leahy,David ?i?ka,Lukasz Szpruch

We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial condition. Our results rely on the careful analysis of non-linear Fokker--Planck--Kolmogorov equation and extend the pioneering work of Mei et al. 2020 and Agarwal et al. 2020, which quantify the global convergence rate of policy gradient for entropy-regularized MDPs in the tabular setting.

Extensibility · 控制器 · 穩健性 · 有向 · Continuity ·

2022 年 1 月 18 日

Convergence of a robust deep FBSDE method for stochastic control

Kristoffer Andersson,Adam Andersson,Cornelis W. Oosterlee

from arxiv, 25 pages, 4 figures, 3 tables

In this paper we propose a deep learning based numerical scheme for strongly coupled FBSDE, stemming from stochastic control. It is a modification of the deep BSDE method in which the initial value to the backward equation is not a free parameter, and with a new loss function being the weighted sum of the cost of the control problem, and a variance term which coincides with the means square error in the terminal condition. We show by a numerical example that a direct extension of the classical deep BSDE method to FBSDE, fails for a simple linear-quadratic control problem, and motivate why the new method works. Under regularity and boundedness assumptions on the exact controls of time continuous and time discrete control problems we provide an error analysis for our method. We show empirically that the method converges for three different problems, one being the one that failed for a direct extension of the deep BSDE method.

約束優化 · 優化器 · Networks · 電氣電子工程師學會 · 計算統計 ·

2022 年 1 月 17 日

Convergence Analysis of Fixed Point Chance Constrained Optimal Power Flow Problems

Johannes J. Brust,Mihai Anitescu

For optimal power flow problems with chance constraints, a particularly effective method is based on a fixed point iteration applied to a sequence of deterministic power flow problems. However, a priori, the convergence of such an approach is not necessarily guaranteed. This article analyses the convergence conditions for this fixed point approach, and reports numerical experiments including for large IEEE networks.

Continuity · 線性的 · 廣義函數 · 指數衰減 · 同質 ·

2022 年 1 月 17 日

Mittag--Leffler stability of numerical solutions to time fractional ODEs

Dongling Wang,Jun Zou

from arxiv, All comments are welcome!

The asymptotic stable region and long-time decay rate of solutions to linear homogeneous Caputo time fractional ordinary differential equations (F-ODEs) are known to be completely determined by the eigenvalues of the coefficient matrix. Very different from the exponential decay of solutions to classical ODEs, solutions of F-ODEs decay only polynomially, leading to the so-called Mittag-Leffler stability, which was already extended to semi-linear F-ODEs with small perturbations. This work is mainly devoted to the qualitative analysis of the long-time behavior of numerical solutions. By applying the singularity analysis of generating functions developed by Flajolet and Odlyzko (SIAM J. Disc. Math. 3 (1990), 216-240), we are able to prove that both $\mathcal{L}$1 scheme and strong $A$-stable fractional linear multistep methods (F-LMMs) can preserve the numerical Mittag-Leffler stability for linear homogeneous F-ODEs exactly as in the continuous case. Through an improved estimate of the discrete fractional resolvent operator, we show that strong $A$-stable F-LMMs are also Mittag-Leffler stable for semi-linear F-ODEs under small perturbations. For the numerical schemes based on $\alpha$-difference approximation to Caputo derivative, we establish the Mittag-Leffler stability for semi-linear problems by making use of properties of the Poisson transformation and the decay rate of the continuous fractional resolvent operator. Numerical experiments are presented for several typical time fractional evolutional equations, including time fractional sub-diffusion equations, fractional linear system and semi-linear F-ODEs. All the numerical results exhibit the typical long-time polynomial decay rate, which is fully consistent with our theoretical predictions.

優化器 · 估計/估計量 · 情景 · 最優化 · 噪聲 ·

2022 年 1 月 15 日

Imaginary Zeroth-Order Optimization

Wouter Jongeneel

from arxiv, 33 pages, 14 figures (submitted)

Zeroth-order optimization methods are developed to overcome the practical hurdle of having knowledge of explicit derivatives. Instead, these schemes work with merely access to noisy functions evaluations. The predominant approach is to mimic first-order methods by means of some gradient estimator. The theoretical limitations are well-understood, yet, as most of these methods rely on finite-differencing for shrinking differences, numerical cancellation can be catastrophic. The numerical community developed an efficient method to overcome this by passing to the complex domain. This approach has been recently adopted by the optimization community and in this work we analyze the practically relevant setting of dealing with computational noise. To exemplify the possibilities we focus on the strongly-convex optimization setting and provide a variety of non-asymptotic results, corroborated by numerical experiments, and end with local non-convex optimization.

優化器 · Neural Networks · 環 · Networking · 均值 ·

2022 年 1 月 15 日

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

Atsushi Nitanda,Denny Wu,Taiji Suzuki

from arxiv, NeurIPS 2021

We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can thus be interpreted as an extension of the Langevin algorithm to naturally handle nonlinear functional on the probability space. An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain. By adapting finite-dimensional convex optimization theory into the space of measures, we analyze PDA in regularized empirical / expected risk minimization, and establish quantitative global convergence in learning two-layer mean field neural networks under more general settings. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.

學成 · 均值 · 強化學習 · entity · INTERACT ·

2018 年 6 月 12 日

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang,Rui Luo,Minne Li,Ming Zhou,Weinan Zhang,Jun Wang

from arxiv, ICML 2018 (Full paper + Long talk)

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.