亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<li id='O2r0V'></li>

_{^{<dd id='DCDXD'><tbody id='IapHP'><td id='T9eEF'><optgroup id='zNxYs'><strong id='cYUGk'></strong></optgroup><address id='uI2bT'><ul id='aJADs'></ul></address><big id='PR0Ir'></big></td><table id='23Efz'></table></tbody><pre id='zvr3T'></pre></dd><span id='ENFZx'><b id='OR7Mi'></b></span>}}


<dfn id='QSlIe'><optgroup id='ug5Fo'></optgroup></dfn><tfoot id='GyrY8'><bdo id='ykIzA'><div id='z1nTx'></div><i id='I8Eh2'><dt id='tGJa4'></dt></i></bdo></tfoot>

_{<fieldset id='YiWiE'></fieldset>}

·

Learning · 不完美信息 · Agent · 強化學習 · Self-Play ·

2022 年 6 月 30 日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Julien Perolat,Bart de Vylder,Daniel Hennes,Eugene Tarassov,Florian Strub,Vincent de Boer,Paul Muller,Jerome T. Connor,Neil Burch,Thomas Anthony,Stephen McAleer,Romuald Elie,Sarah H. Cen,Zhe Wang,Audrunas Gruslys,Aleksandra Malysheva,Mina Khan,Sherjil Ozair,Finbarr Timbers,Toby Pohlen,Tom Eccles,Mark Rowland,Marc Lanctot,Jean-Baptiste Lespiau,Bilal Piot,Shayegan Omidshafiei,Edward Lockhart,Laurent Sifre,Nathalie Beauguerlange,Remi Munos,David Silver,Satinder Singh,Demis Hassabis,Karl Tuyls

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

相關內容

Learning

Learning · 可約的 · 樣本 · 經驗回放 · 損失 ·

2022 年 8 月 22 日

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Shivakanth Sujit,Somjit Nath,Pedro H. M. Braga,Samira Ebrahimi Kahou

Most reinforcement learning algorithms take advantage of an experience replay buffer to repeatedly train on samples the agent has observed in the past. This prevents catastrophic forgetting, however simply assigning equal importance to each of the samples is a naive strategy. In this paper, we propose a method to prioritize samples based on how much we can learn from a sample. We define the learn-ability of a sample as the steady decrease of the training loss associated with this sample over time. We develop an algorithm to prioritize samples with high learn-ability, while assigning lower priority to those that are hard-to-learn, typically caused by noise or stochasticity. We empirically show that our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in vanilla prioritized experience replay.

Agent · INTERACT · Learning · Performer · 強化學習 ·

2022 年 8 月 22 日

Incorporating Rivalry in Reinforcement Learning for a Competitive Game

Pablo Barros,Ozge Nilay Yalc?n,Ana Tanevska,Alessandra Sciutti

from arxiv, Accepted at the Neural Computing and Applications Journal

Recent advances in reinforcement learning with social agents have allowed such models to achieve human-level performance on specific interaction tasks. However, most interactive scenarios do not have a version alone as an end goal; instead, the social impact of these agents when interacting with humans is as important and largely unexplored. In this regard, this work proposes a novel reinforcement learning mechanism based on the social impact of rivalry behavior. Our proposed model aggregates objective and social perception mechanisms to derive a rivalry score that is used to modulate the learning of artificial agents. To investigate our proposed model, we design an interactive game scenario, using the Chef's Hat Card Game, and examine how the rivalry modulation changes the agent's playing style, and how this impacts the experience of human players in the game. Our results show that humans can detect specific social characteristics when playing against rival agents when compared to common agents, which directly affects the performance of the human players in subsequent games. We conclude our work by discussing how the different social and objective features that compose the artificial rivalry score contribute to our results.

Learning · 控制器 · Agent · 強化學習 · 深度強化學習 ·

2022 年 8 月 22 日

Learning Ball-balancing Robot Through Deep Reinforcement Learning

Yifan Zhou,Jianghao Lin,Shuai Wang,Chong Zhang

from arxiv, 7+1 pages

The ball-balancing robot (ballbot) is a good platform to test the effectiveness of a balancing controller. Considering balancing control, conventional model-based feedback control methods have been widely used. However, contacts and collisions are difficult to model, and often lead to failure in balancing control, especially when the ballbot tilts a large angle. To explore the maximum initial tilting angle of the ballbot, the balancing control is interpreted as a recovery task using Reinforcement Learning (RL). RL is a powerful technique for systems that are difficult to model, because it allows an agent to learn policy by interacting with the environment. In this paper, by combining the conventional feedback controller with the RL method, a compound controller is proposed. We show the effectiveness of the compound controller by training an agent to successfully perform a recovery task involving contacts and collisions. Simulation results demonstrate that using the compound controller, the ballbot can keep balance under a larger set of initial tilting angles, compared to the conventional model-based controller.

Lyapunov · Learning · Agent · 泛函 · 強化學習 ·

2022 年 8 月 21 日

On stabilizing reinforcement learning without Lyapunov functions

Pavel Osinenko,Grigory Yaremenko,Georgiy Malaniya

Reinforcement learning remains one of the major directions of the contemporary development of control engineering and machine learning. Nice intuition, flexible settings, ease of application are among the many perks of this methodology. From the standpoint of machine learning, the main strength of a reinforcement learning agent is its ability to "capture" (learn) the optimal behavior in the given environment. Typically, the agent is built on neural networks and it is their approximation abilities that give rise to the above belief. From the standpoint of control engineering, however, reinforcement learning has serious deficiencies. The most significant one is the lack of stability guarantee of the agent-environment closed loop. A great deal of research was and is being made towards stabilizing reinforcement learning. Speaking of stability, the celebrated Lyapunov theory is the de facto tool. It is thus no wonder that so many techniques of stabilizing reinforcement learning rely on the Lyapunov theory in one way or another. In control theory, there is an intricate connection between a stabilizing controller and a Lyapunov function. Employing such a pair seems thus quite attractive to design stabilizing reinforcement learning. However, computation of a Lyapunov function is generally a cumbersome process. In this note, we show how to construct a stabilizing reinforcement learning agent that does not employ such a function at all. We only assume that a Lyapunov function exists, which is a natural thing to do if the given system (read: environment) is stabilizable, but we do not need to compute one.

Agent · 泛化理論 · Learning · 強化學習 · 深度強化學習 ·

2022 年 8 月 20 日

Goal Misgeneralization in Deep Reinforcement Learning

Lauro Langosco,Jack Koch,Lee Sharkey,Jacob Pfau,Laurent Orseau,David Krueger

from arxiv, Published in ICML 2022. 9 Pages

We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time. We formalize this distinction between capability and goal generalization, provide the first empirical demonstrations of goal misgeneralization, and present a partial characterization of its causes.

Learning · 深度強化學習 · 強化學習 · motivation · Agent ·

2022 年 8 月 18 日

A Review of Uncertainty for Deep Reinforcement Learning

Owen Lockwood,Mei Si

from arxiv, Accepted to AIIDE 2022

Uncertainty is ubiquitous in games, both in the agents playing games and often in the games themselves. Working with uncertainty is therefore an important component of successful deep reinforcement learning agents. While there has been substantial effort and progress in understanding and working with uncertainty for supervised learning, the body of literature for uncertainty aware deep reinforcement learning is less developed. While many of the same problems regarding uncertainty in neural networks for supervised learning remain for reinforcement learning, there are additional sources of uncertainty due to the nature of an interactable environment. In this work, we provide an overview motivating and presenting existing techniques in uncertainty aware deep reinforcement learning. These works show empirical benefits on a variety of reinforcement learning tasks. This work serves to help to centralize the disparate results and promote future research in this area.

樣本復雜度 · ReLU · Learning · Networking · 近似 ·

2022 年 8 月 18 日

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Thanh Nguyen-Tang,Sunil Gupta,Hung Tran-The,Svetha Venkatesh

from arxiv, A short version published in the ICML Workshop on Reinforcement Learning Theory, 2021

Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation setting remain limited. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $\tilde{\mathcal{O}}\left( \kappa^{1 + d/\alpha} \cdot \epsilon^{-2 - 2d/\alpha} \right)$ for offline RL with deep ReLU networks, where $\kappa$ is a measure of distributional shift, $d$ is the dimension of the state-action space, $\alpha$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $\epsilon$ is a user-specified error. Notably, our sample complexity holds under two novel considerations, namely the Besov dynamic closure and the correlated structure that arises from value regression for offline RL. While the Besov dynamic closure generalizes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.

學成 · 強化學習 · Performer · Better · state-of-the-art ·

2020 年 2 月 10 日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Yaodong Yang,Jianye Hao,Guangyong Chen,Hongyao Tang,Yingfeng Chen,Yujing Hu,Changjie Fan,Zhongyu Wei

Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into individual agents' Q-values. Unlike previous works which restrict the representation relation of the individual Q-values and the global one, we leverage the integrated gradient attribution technique into deep MARL to directly decompose global Q-values along trajectory paths to assign credits for agents. We evaluate QPD on the challenging StarCraft II micromanagement tasks and show that QPD achieves the state-of-the-art performance in both homogeneous and heterogeneous multiagent scenarios compared with existing cooperative MARL algorithms.

深度強化學習 · 強化學習 · 學成 · 回合 · 優化器 ·

2018 年 6 月 27 日

A Multi-Objective Deep Reinforcement Learning Framework

Thanh Thi Nguyen

from arxiv, 17 pages

This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose the use of linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on two benchmark problems including the two-objective deep sea treasure environment and the three-objective mountain car problem indicate that the proposed framework is able to converge to the optimal Pareto solutions effectively. The proposed framework is generic, which allows implementation of different deep reinforcement learning algorithms in different complex environments. This therefore overcomes many difficulties involved with standard multi-objective reinforcement learning (MORL) methods existing in the current literature. The framework creates a platform as a testbed environment to develop methods for solving various problems associated with the current MORL. Details of the framework implementation can be referred to //www.deakin.edu.au/~thanhthi/drl.htm.

SOFT · Continuity · Better · Performer · state-of-the-art ·

2018 年 4 月 25 日

Multiagent Soft Q-Learning

Ermo Wei,Drew Wicke,David Freelan,Sean Luke

from arxiv, Accepted in AAAI 18 Spring Symposium

Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

不完美信息

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<form id='clp0F'></form>

<bdo id='6NfEl'><sup id='uE3t0'><div id='JIi5x'><bdo id='Rpd1x'></bdo></div></sup></bdo>