2020久久精品亚洲热综合,69WW无码免费视频播放,国产日韩欧美亚洲综合专区

We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer. EVA shifts the value predicted by a neural network with an estimate of the value function found by planning over experience tuples from the replay buffer near the current state. EVA combines a number of recent ideas around combining episodic memory-like structures into reinforcement learning agents: slot-based storage, content-based retrieval, and memory-based planning. We show that EVAis performant on a demonstration task and Atari games.

相關內容

經驗池

關注 0

學成 · Performer · 強化學習 · 優化器 · 自動問答 ·

2019 年 4 月 3 日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Fréderic Godin,Anjishnu Kumar,Arpit Mittal

from arxiv, Accepted at NAACL 2019. Version 1 was presented at NIPS 2018 workshop on Relational Representation Learning

In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.

學成 · 深度強化學習 · 強化學習 · 可約的 · 圖 ·

2019 年 3 月 25 日

Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning

Prithviraj Ammanabrolu,Mark O. Riedl

from arxiv, Proceedings of NAACL-HLT 2019

Text-based adventure games provide a platform on which to explore reinforcement learning in the context of a combinatorial action space, such as natural language. We present a deep reinforcement learning architecture that represents the game state as a knowledge graph which is learned during exploration. This graph is used to prune the action space, enabling more efficient exploration. The question of which action to take can be reduced to a question-answering task, a form of transfer learning that pre-trains certain parts of our architecture. In experiments using the TextWorld framework, we show that our proposed technique can learn a control policy faster than baseline alternatives. We have also open-sourced our code at //github.com/rajammanabrolu/KG-DQN.

機器人操作平臺 · 強化學習 · 學成 · CASES · 機器人 ·

2019 年 3 月 14 日

gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo

Nestor Gonzalez Lopez,Yue Leire Erro Nuin,Elias Barba Moral,Lander Usategui San Juan,Alejandro Solano Rueda,Víctor Mayoral Vilches,Risto Kojcev

This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.

Performer · 優化器 · 強化學習 · 學成 · 深度強化學習 ·

2019 年 2 月 26 日

Using Ternary Rewards to Reason over Knowledge Graphs with Deep Reinforcement Learning

Fréderic Godin,Anjishnu Kumar,Arpit Mittal

from arxiv, Presented at NIPS 2018 workshop on Relational Representation Learning. An extended version with title 'Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering' is available on Arxiv and will be presented at NAACL 2019

In this paper, we investigate the practical challenges of using reinforcement learning agents for question-answering over knowledge graphs. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of questions that were previously answered correctly.

學成 · 深度Q網絡 · Q網絡` · 價值函數 · 學習的學習 ·

2018 年 11 月 26 日

Deep Reinforcement Learning: An Overview

Yuxi Li

from arxiv, Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update

We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, natural language processing, including dialogue systems, machine translation, and text generation, computer vision, neural architecture design, business management, finance, healthcare, Industry 4.0, smart grid, intelligent transportation systems, and computer systems. We mention topics not reviewed yet, and list a collection of RL resources. After presenting a brief summary, we close with discussions. Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update.

Performer · 深度強化學習 · 學成 · entity · 強化學習 ·

2018 年 6 月 28 日

Relational Deep Reinforcement Learning

Vinicius Zambaldi,David Raposo,Adam Santoro,Victor Bapst,Yujia Li,Igor Babuschkin,Karl Tuyls,David Reichert,Timothy Lillicrap,Edward Lockhart,Murray Shanahan,Victoria Langston,Razvan Pascanu,Matthew Botvinick,Oriol Vinyals,Peter Battaglia

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.

深度強化學習 · 強化學習 · 學成 · 回合 · 優化器 ·

2018 年 6 月 27 日

A Multi-Objective Deep Reinforcement Learning Framework

Thanh Thi Nguyen

from arxiv, 17 pages

This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose the use of linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on two benchmark problems including the two-objective deep sea treasure environment and the three-objective mountain car problem indicate that the proposed framework is able to converge to the optimal Pareto solutions effectively. The proposed framework is generic, which allows implementation of different deep reinforcement learning algorithms in different complex environments. This therefore overcomes many difficulties involved with standard multi-objective reinforcement learning (MORL) methods existing in the current literature. The framework creates a platform as a testbed environment to develop methods for solving various problems associated with the current MORL. Details of the framework implementation can be referred to //www.deakin.edu.au/~thanhthi/drl.htm.

學成 · 均值 · 強化學習 · entity · INTERACT ·

2018 年 6 月 12 日

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang,Rui Luo,Minne Li,Ming Zhou,Weinan Zhang,Jun Wang

from arxiv, ICML 2018 (Full paper + Long talk)

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

學成 · 泛函 · 優化器 · 控制器 · MoDELS ·

2018 年 1 月 29 日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Motoya Ohnishi,Li Wang,Gennaro Notomista,Magnus Egerstedt

from arxiv, 14 pages, 10 figures, submitted to IEEE Transactions on Robotics

This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback controllers only when safety is about to be violated. Under some mild assumptions, solutions to the constrained feedback-controller optimization are guaranteed to be globally optimal, and the monotonic improvement of a feedback controller is thus ensured. In addition, we reformulate the (action-)value function approximation to make any kernel-based nonlinear function estimation method applicable. We then employ a state-of-the-art kernel adaptive filtering technique for the (action-)value function approximation. The resulting framework is verified experimentally on a brushbot, whose dynamics is unknown and highly complex.

Networking · 學成 · 控制器 · 深度強化學習 · Extensibility ·

2018 年 1 月 17 日

Experience-driven Networking: A Deep Reinforcement Learning based Approach

Zhiyuan Xu,Jian Tang,Jingsong Meng,Weiyi Zhang,Yanzhi Wang,Chi Harold Liu,Dejun Yang

from arxiv, 9 pages, 12 figures, paper is accepted as a conference paper at IEEE Infocom 2018

Modern communication networks have become very complicated and highly dynamic, which makes them hard to model, predict and control. In this paper, we develop a novel experience-driven approach that can learn to well control a communication network from its own experience rather than an accurate mathematical model, just as a human learns a new skill (such as driving, swimming, etc). Specifically, we, for the first time, propose to leverage emerging Deep Reinforcement Learning (DRL) for enabling model-free control in communication networks; and present a novel and highly effective DRL-based control framework, DRL-TE, for a fundamental networking problem: Traffic Engineering (TE). The proposed framework maximizes a widely-used utility function by jointly learning network environment and its dynamics, and making decisions under the guidance of powerful Deep Neural Networks (DNNs). We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE. To validate and evaluate the proposed framework, we implemented it in ns-3, and tested it comprehensively with both representative and randomly generated network topologies. Extensive packet-level simulation results show that 1) compared to several widely-used baseline methods, DRL-TE significantly reduces end-to-end delay and consistently improves the network utility, while offering better or comparable throughput; 2) DRL-TE is robust to network changes; and 3) DRL-TE consistently outperforms a state-ofthe-art DRL method (for continuous control), Deep Deterministic Policy Gradient (DDPG), which, however, does not offer satisfying performance.