无码人妻一区二区三区在线不卡-91超碰人妻偷情在线播放

Arthur Müller,Vishal Rangras,Georg Schnittker,Michael Waldmann,Maxim Friesen,Tobias Ferfers,Lukas Schreckenberg,Florian Hufen,Jürgen Jasperneite,Marco Wiering

from arxiv, Paper was accepted by ICMLA 2021 (20th IEEE International Conference on Machine Learning and Applications). Code available under //github.com/RL-INA/LemgoRL

Sub-optimal control policies in intersection traffic signal controllers (TSC) contribute to congestion and lead to negative effects on human health and the environment. Reinforcement learning (RL) for traffic signal control is a promising approach to design better control policies and has attracted considerable research interest in recent years. However, most work done in this area used simplified simulation environments of traffic scenarios to train RL-based TSC. To deploy RL in real-world traffic systems, the gap between simplified simulation environments and real-world applications has to be closed. Therefore, we propose LemgoRL, a benchmark tool to train RL agents as TSC in a realistic simulation environment of Lemgo, a medium-sized town in Germany. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the wellknown OpenAI gym toolkit to enable easy deployment in existing research work. To demonstrate the functionality and applicability of LemgoRL, we train a state-of-the-art Deep RL algorithm on a CPU cluster utilizing a framework for distributed and parallel RL and compare its performance with other methods. Our benchmark tool drives the development of RL algorithms towards real-world applications.

相關內容

TSC

關注 0

服務范圍涵蓋服務創新研發的所有計算和軟件科學技術方面。IEEE服務計算事務強調算法、數學、統計和計算方法，這些方法是服務計算的核心，是面向服務的體系結構、Web服務、業務流程集成、解決方案性能管理、服務操作和管理的新興領域。官網地址：

學成 · Performer · 可理解性 · 正則化項 · 損失 ·

2022 年 4 月 20 日

Understanding and Preventing Capacity Loss in Reinforcement Learning

Clare Lyle,Mark Rowland,Will Dabney

from arxiv, Presented at ICLR 2022

The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents: \textit{capacity loss}, whereby networks trained on a sequence of target values lose their ability to quickly update their predictions over time. We demonstrate that capacity loss occurs in a range of RL agents and environments, and is particularly damaging to performance in sparse-reward tasks. We then present a simple regularizer, Initial Feature Regularization (InFeR), that mitigates this phenomenon by regressing a subspace of features towards its value at initialization, leading to significant performance improvements in sparse-reward environments such as Montezuma's Revenge. We conclude that preventing capacity loss is crucial to enable agents to maximally benefit from the learning signals they obtain throughout the entire training trajectory.

回合 · 數據集 · Performer · 控制器 · 穩健性 ·

2022 年 4 月 20 日

A Reinforcement Learning-based Volt-VAR Control Dataset and Testing Environment

Yuanqi Gao,Nanpeng Yu

To facilitate the development of reinforcement learning (RL) based power distribution system Volt-VAR control (VVC), this paper introduces a suite of open-source datasets for RL-based VVC algorithm research that is sample efficient, safe, and robust. The dataset consists of two components: 1. a Gym-like VVC testing environment for the IEEE-13, 123, and 8500-bus test feeders and 2. a historical operational dataset for each of the feeders. Potential users of the dataset and testing environment could first train an sample-efficient off-line (batch) RL algorithm on the historical dataset and then evaluate the performance of the trained RL agent on the testing environments. This dataset serves as a useful testbed to conduct RL-based VVC research mimicking the real-world operational challenges faced by electric utilities. Meanwhile, it allows researchers to conduct fair performance comparisons between different algorithms.

Q網絡` · 深度Q網絡 · 控制器 · TSC · 學成 ·

2022 年 4 月 20 日

DynLight: Realize dynamic phase duration with multi-level traffic signal control

Liang Zhang,Shubin Xie,Jianming Deng

from arxiv, 9 pages, 9 figures

Adopting reinforcement learning (RL) for traffic signal control (TSC) is increasingly popular, and RL has become a promising solution for traffic signal control. However, several challenges still need to be overcome. Firstly, most RL methods use fixed action duration and select the green phase for the next state, which makes the phase duration less dynamic and flexible. Secondly, the phase sequence of RL methods can be arbitrary, affecting the real-world deployment which may require a cyclical phase structure. Lastly, the average travel time and throughput are not fair metrics to evaluate TSC performance. To address these challenges, we propose a multi-level traffic signal control framework, DynLight, which uses an optimization method Max-QueueLength (M-QL) to determine the phase and uses a deep Q-network to determine the duration of the corresponding phase. Based on DynLight, we further propose DynLight-C which adopts a well-trained deep Q-network of DynLight and replace M-QL with a cyclical control policy that actuates a set of phases in fixed cyclical order to realize cyclical phase structure. Comprehensive experiments on multiple real-world datasets demonstrate that DynLight achieves a new state-of-the-art. Furthermore, the deep Q-network of DynLight can learn well on determining the phase duration and DynLight-C demonstrates high performance for deployment.

潛變量/隱變量 · 生成模型 · MoDELS · Performer · 學成 ·

2022 年 4 月 18 日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Ali Ghadirzadeh,Petra Poklukar,Karol Arndt,Chelsea Finn,Ville Kyrki,Danica Kragic,M?rten Bj?rkman

from arxiv, arXiv admin note: substantial text overlap with arXiv:2007.13134

We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.

回合 · 知識 (knowledge) · 情景 · 可辨認的 · 數據集 ·

2022 年 4 月 18 日

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Federico Landi,Roberto Bigazzi,Marcella Cornia,Silvia Cascianelli,Lorenzo Baraldi,Rita Cucchiara

from arxiv, Accepted by 26TH International Conference on Pattern Recognition (ICPR 2022)

Embodied AI is a recent research area that aims at creating intelligent agents that can move and operate inside an environment. Existing approaches in this field demand the agents to act in completely new and unexplored scenes. However, this setting is far from realistic use cases that instead require executing multiple tasks in the same environment. Even if the environment changes over time, the agent could still count on its global knowledge about the scene while trying to adapt its internal representation to the current state of the environment. To make a step towards this setting, we propose Spot the Difference: a novel task for Embodied AI where the agent has access to an outdated map of the environment and needs to recover the correct layout in a fixed time budget. To this end, we collect a new dataset of occupancy maps starting from existing datasets of 3D spaces and generating a number of possible layouts for a single environment. This dataset can be employed in the popular Habitat simulator and is fully compliant with existing methods that employ reconstructed occupancy maps during navigation. Furthermore, we propose an exploration policy that can take advantage of previous knowledge of the environment and identify changes in the scene faster and more effectively than existing agents. Experimental results show that the proposed architecture outperforms existing state-of-the-art models for exploration on this new setting.

穩健性 · 學成 · 強化學習 · Performer · state-of-the-art ·

2022 年 4 月 17 日

Towards Comprehensive Testing on the Robustness of Cooperative Multi-agent Reinforcement Learning

Jun Guo,Yonghong Chen,Yihang Hao,Zixin Yin,Yin Yu,Simin Li

While deep neural networks (DNNs) have strengthened the performance of cooperative multi-agent reinforcement learning (c-MARL), the agent policy can be easily perturbed by adversarial examples. Considering the safety critical applications of c-MARL, such as traffic management, power management and unmanned aerial vehicle control, it is crucial to test the robustness of c-MARL algorithm before it was deployed in reality. Existing adversarial attacks for MARL could be used for testing, but is limited to one robustness aspects (e.g., reward, state, action), while c-MARL model could be attacked from any aspect. To overcome the challenge, we propose MARLSafe, the first robustness testing framework for c-MARL algorithms. First, motivated by Markov Decision Process (MDP), MARLSafe consider the robustness of c-MARL algorithms comprehensively from three aspects, namely state robustness, action robustness and reward robustness. Any c-MARL algorithm must simultaneously satisfy these robustness aspects to be considered secure. Second, due to the scarceness of c-MARL attack, we propose c-MARL attacks as robustness testing algorithms from multiple aspects. Experiments on \textit{SMAC} environment reveals that many state-of-the-art c-MARL algorithms are of low robustness in all aspect, pointing out the urgent need to test and enhance robustness of c-MARL algorithms.

圖 · 學成 · MoDELS · Extensibility · 深度學習 ·

2022 年 2 月 24 日

Bayesian Deep Learning for Graphs

Federico Errica

from arxiv, PhD Thesis

The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.

AIM · INFORMS · 不完美信息 · 控制器 · AI ·

2021 年 10 月 21 日

On games and simulators as a platform for development of artificial intelligence for command and control

Vinicius G. Goecks,Nicholas Waytowich,Derrik E. Asher,Song Jun Park,Mark Mittrick,John Richardson,Manuel Vindiola,Anne Logie,Mark Dennison,Theron Trout,Priya Narayanan,Alexander Kott

from arxiv, Preprint submitted to the Journal of Defense Modeling and Simulation (JDMS) for peer review

Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.

強化學習 · 學成 · tuning · 回合 · 有向 ·

2020 年 1 月 19 日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Amit Kumar Mondal,Nadeem Jamali

Reinforcement learning is one of the core components in designing an artificial intelligent system emphasizing real-time response. Reinforcement learning influences the system to take actions within an arbitrary environment either having previous knowledge about the environment model or not. In this paper, we present a comprehensive study on Reinforcement Learning focusing on various dimensions including challenges, the recent development of different state-of-the-art techniques, and future directions. The fundamental objective of this paper is to provide a framework for the presentation of available methods of reinforcement learning that is informative enough and simple to follow for the new researchers and academics in this domain considering the latest concerns. First, we illustrated the core techniques of reinforcement learning in an easily understandable and comparable way. Finally, we analyzed and depicted the recent developments in reinforcement learning approaches. My analysis pointed out that most of the models focused on tuning policy values rather than tuning other things in a particular state of reasoning.

2018 年 1 月 5 日

Deep Reinforcement Learning for List-wise Recommendations

Xiangyu Zhao,Liang Zhang,Zhuoye Ding,Dawei Yin,Yihong Zhao,Jiliang Tang

Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.