日本三级网站在线播放,黄色在线观看国产,中文字幕AV一区二区精品,欧美亚洲综合日韩一区二区精品

Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.

相關內容

Better

關注 1

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

學成 · 深度強化學習 · 強化學習 · 樣本復雜度 · Atari ·

2019 年 1 月 10 日

Accelerated Methods for Deep Reinforcement Learning

Adam Stooke,Pieter Abbeel

from arxiv, v2: -Added game performance statistics summary for algorithm scaling across full Atari game set. -Added full set of learning curves (appendix). -Fixed images to remove phantom borders. -Streamlined some discussion, moved some details to appendix

Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. We investigate how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs. We confirm that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances. We further find it possible to train using batch sizes considerably larger than are standard, without negatively affecting sample complexity or final performance. We leverage these facts to build a unified framework for parallelization that dramatically hastens experiments in both classes of algorithm. All neural network computations use GPUs, accelerating both data collection and training. Our results include using an entire DGX-1 to learn successful strategies in Atari games in mere minutes, using both synchronous and asynchronous algorithms.

學成 · 強化學習 · 中央處理器 (CPU) · GPU · 訓練樣本 ·

2018 年 10 月 24 日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Jacky Liang,Viktor Makoviychuk,Ankur Handa,Nuttapong Chentanez,Miles Macklin,Dieter Fox

from arxiv, Accepted and to appear at the Conference on Robot Learning (CoRL) 2018

Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.

AutoML · 可約的 · 遷移學習 · Automator · 學成 ·

2018 年 9 月 11 日

Transfer Learning with Neural AutoML

Catherine Wong,Neil Houlsby,Yifeng Lu,Andrea Gesmundo

We reduce the computational cost of Neural AutoML with transfer learning. AutoML relieves human effort by automating the design of ML algorithms. Neural AutoML has become popular for the design of deep learning architectures, however, this method has a high computation cost.To address this we propose Transfer Neural AutoML that uses knowledge from prior tasks to speed up network design. We extend RL-based architecture search methods to support parallel training on multiple tasks and then transfer the search strategy to new tasks. On language and image classification data, Transfer Neural AutoML reduces convergence time over single-task training by over an order of magnitude on many tasks.

學成 · 深度Q網絡 · 強化學習 · Q網絡` · 深度強化學習 ·

2018 年 9 月 6 日

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Tom Zahavy,Matan Haroush,Nadav Merlis,Daniel J. Mankowitz,Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

坐標下降 · 優化器 · Performer · 學成 · 在線 ·

2018 年 7 月 16 日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Akshita Bhandari,Chandramani Singh

from arxiv, 20 pages, 4 figures, 2 tables

We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.

深度強化學習 · 強化學習 · 學成 · 回合 · 優化器 ·

2018 年 6 月 27 日

A Multi-Objective Deep Reinforcement Learning Framework

Thanh Thi Nguyen

from arxiv, 17 pages

This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose the use of linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on two benchmark problems including the two-objective deep sea treasure environment and the three-objective mountain car problem indicate that the proposed framework is able to converge to the optimal Pareto solutions effectively. The proposed framework is generic, which allows implementation of different deep reinforcement learning algorithms in different complex environments. This therefore overcomes many difficulties involved with standard multi-objective reinforcement learning (MORL) methods existing in the current literature. The framework creates a platform as a testbed environment to develop methods for solving various problems associated with the current MORL. Details of the framework implementation can be referred to //www.deakin.edu.au/~thanhthi/drl.htm.

SOFT · Continuity · Better · Performer · state-of-the-art ·

2018 年 4 月 25 日

Multiagent Soft Q-Learning

Ermo Wei,Drew Wicke,David Freelan,Sean Luke

from arxiv, Accepted in AAAI 18 Spring Symposium

Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.

學成 · 強化學習 · 深度強化學習 · 近似 · INFORMS ·

2018 年 4 月 22 日

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

Sergio Valcarcel Macua,Aleksi Tukiainen,Daniel García-Oca?a Hernández,David Baldazo,Enrique Munoz de Cote,Santiago Zazo

We propose a fully distributed actor-critic algorithm approximated by deep neural networks, named \textit{Diff-DAC}, with application to single-task and to average multitask reinforcement learning (MRL). Each agent has access to data from its local task only, but it aims to learn a policy that performs well on average for the whole set of tasks. During the learning process, agents communicate their value-policy parameters to their neighbors, diffusing the information across the network, so that they converge to a common policy, with no need for a central node. The method is scalable, since the computational and communication costs per agent grow with its number of neighbors. We derive Diff-DAC's from duality theory and provide novel insights into the standard actor-critic framework, showing that it is actually an instance of the dual ascent method that approximates the solution of a linear program. Experiments suggest that Diff-DAC can outperform the single previous distributed MRL approach (i.e., Dist-MTLPS) and even the centralized architecture.

圖 · 學成 · 知識圖譜 · FreeBASIC · 強化學習 ·

2018 年 1 月 8 日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Wenhan Xiong,Thien Hoang,William Yang Wang

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.