亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The pursuit-evasion game in Smart City brings a profound impact on the Multi-vehicle Pursuit (MVP) problem, when police cars cooperatively pursue suspected vehicles. Existing studies on the MVP problems tend to set evading vehicles to move randomly or in a fixed prescribed route. The opponent modeling method has proven considerable promise in tackling the non-stationary caused by the adversary agent. However, most of them focus on two-player competitive games and easy scenarios without the interference of environments. This paper considers a Team-to-Team Multi-vehicle Pursuit (T2TMVP) problem in the complicated urban traffic scene where the evading vehicles adopt the pre-trained dynamic strategies to execute decisions intelligently. To solve this problem, we propose an opponent-aware reinforcement learning via maximizing mutual information indicator (OARLM2I2) method to improve pursuit efficiency in the complicated environment. First, a sequential encoding-based opponents joint strategy modeling (SEOJSM) mechanism is proposed to generate evading vehicles' joint strategy model, which assists the multi-agent decision-making process based on deep Q-network (DQN). Then, we design a mutual information-united loss, simultaneously considering the reward fed back from the environment and the effectiveness of opponents' joint strategy model, to update pursuing vehicles' decision-making process. Extensive experiments based on SUMO demonstrate our method outperforms other baselines by 21.48% on average in reducing pursuit time. The code is available at \url{//github.com/ANT-ITS/OARLM2I2}.

相關內容

《計算機信息》雜志發表高質量的論文,擴大了運籌學和計算的范圍,尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文,以及描述新的和有用的軟件工具的論文。官網鏈接: · Agent · Learning · 優化器 · 基準 ·
2022 年 12 月 7 日

Various types of Multi-Agent Reinforcement Learning (MARL) methods have been developed, assuming that agents' policies are based on true states. Recent works have improved the robustness of MARL under uncertainties from the reward, transition probability, or other partners' policies. However, in real-world multi-agent systems, state estimations may be perturbed by sensor measurement noise or even adversaries. Agents' policies trained with only true state information will deviate from optimal solutions when facing adversarial state perturbations during execution. MARL under adversarial state perturbations has limited study. Hence, in this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to study the fundamental properties of MARL under state uncertainties. We prove that the optimal agent policy and the robust Nash equilibrium do not always exist for an SAMG. Instead, we define the solution concept, robust agent policy, of the proposed SAMG under adversarial state perturbations, where agents want to maximize the worst-case expected state value. We then design a gradient descent ascent-based robust MARL algorithm to learn the robust policies for the MARL agents. Our experiments show that adversarial state perturbations decrease agents' rewards for several baselines from the existing literature, while our algorithm outperforms baselines with state perturbations and significantly improves the robustness of the MARL policies under state uncertainties.

In recent years, unmanned aerial vehicle (UAV) related technology has expanded knowledge in the area, bringing to light new problems and challenges that require solutions. Furthermore, because the technology allows processes usually carried out by people to be automated, it is in great demand in industrial sectors. The automation of these vehicles has been addressed in the literature, applying different machine learning strategies. Reinforcement learning (RL) is an automation framework that is frequently used to train autonomous agents. RL is a machine learning paradigm wherein an agent interacts with an environment to solve a given task. However, learning autonomously can be time consuming, computationally expensive, and may not be practical in highly-complex scenarios. Interactive reinforcement learning allows an external trainer to provide advice to an agent while it is learning a task. In this study, we set out to teach an RL agent to control a drone using reward-shaping and policy-shaping techniques simultaneously. Two simulated scenarios were proposed for the training; one without obstacles and one with obstacles. We also studied the influence of each technique. The results show that an agent trained simultaneously with both techniques obtains a lower reward than an agent trained using only a policy-based approach. Nevertheless, the agent achieves lower execution times and less dispersion during training.

Various types of Multi-Agent Reinforcement Learning (MARL) methods have been developed, assuming that agents' policies are based on true states. Recent works have improved the robustness of MARL under uncertainties from the reward, transition probability, or other partners' policies. However, in real-world multi-agent systems, state estimations may be perturbed by sensor measurement noise or even adversaries. Agents' policies trained with only true state information will deviate from optimal solutions when facing adversarial state perturbations during execution. MARL under adversarial state perturbations has limited study. Hence, in this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to study the fundamental properties of MARL under state uncertainties. We prove that the optimal agent policy and the robust Nash equilibrium do not always exist for an SAMG. Instead, we define the solution concept, robust agent policy, of the proposed SAMG under adversarial state perturbations, where agents want to maximize the worst-case expected state value. We then design a gradient descent ascent-based robust MARL algorithm to learn the robust policies for the MARL agents. Our experiments show that adversarial state perturbations decrease agents' rewards for several baselines from the existing literature, while our algorithm outperforms baselines with state perturbations and significantly improves the robustness of the MARL policies under state uncertainties.

We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by numerical simulations.

In this paper, we present a solution to a design problem of control strategies for multi-agent cooperative transport. Although existing learning-based methods assume that the number of agents is the same as that in the training environment, the number might differ in reality considering that the robots' batteries may completely discharge, or additional robots may be introduced to reduce the time required to complete a task. Therefore, it is crucial that the learned strategy be applicable to scenarios wherein the number of agents differs from that in the training environment. In this paper, we propose a novel multi-agent reinforcement learning framework of event-triggered communication and consensus-based control for distributed cooperative transport. The proposed policy model estimates the resultant force and torque in a consensus manner using the estimates of the resultant force and torque with the neighborhood agents. Moreover, it computes the control and communication inputs to determine when to communicate with the neighboring agents under local observations and estimates of the resultant force and torque. Therefore, the proposed framework can balance the control performance and communication savings in scenarios wherein the number of agents differs from that in the training environment. We confirm the effectiveness of our approach by using a maximum of eight and six robots in the simulations and experiments, respectively.

In reinforcement learning (RL) research, simulations enable benchmarks between algorithms, as well as prototyping and hyper-parameter tuning of agents. In order to promote RL both in research and real-world applications, frameworks are required which are on the one hand efficient in terms of running experiments as fast as possible. On the other hand, they must be flexible enough to allow the integration of newly developed optimization techniques, e.g. new RL algorithms, which are continuously put forward by an active research community. In this paper, we introduce Karolos, a RL framework developed for robotic applications, with a particular focus on transfer scenarios with varying robot-task combinations reflected in a modular environment architecture. In addition, we provide implementations of state-of-the-art RL algorithms along with common learning-facilitating enhancements, as well as an architecture to parallelize environments across multiple processes to significantly speed up experiments. The code is open source and published on GitHub with the aim of promoting research of RL applications in robotics.

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.

The advent of artificial intelligence technology paved the way of many researches to be made within air combat sector. Academicians and many other researchers did a research on a prominent research direction called autonomous maneuver decision of UAV. Elaborative researches produced some outcomes, but decisions that include Reinforcement Learning(RL) came out to be more efficient. There have been many researches and experiments done to make an agent reach its target in an optimal way, most prominent are Genetic Algorithm(GA) , A star, RRT and other various optimization techniques have been used. But Reinforcement Learning is the well known one for its success. In DARPHA Alpha Dogfight Trials, reinforcement learning prevailed against a real veteran F16 human pilot who was trained by Boeing. This successor model was developed by Heron Systems. After this accomplishment, reinforcement learning bring tremendous attention on itself. In this research we aimed our UAV which has a dubin vehicle dynamic property to move to the target in two dimensional space in an optimal path using Twin Delayed Deep Deterministic Policy Gradients (TD3) and used in experience replay Hindsight Experience Replay(HER).We did tests on two different environments and used simulations.

A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.

Properly handling missing data is a fundamental challenge in recommendation. Most present works perform negative sampling from unobserved data to supply the training of recommender models with negative signals. Nevertheless, existing negative sampling strategies, either static or adaptive ones, are insufficient to yield high-quality negative samples --- both informative to model training and reflective of user real needs. In this work, we hypothesize that item knowledge graph (KG), which provides rich relations among items and KG entities, could be useful to infer informative and factual negative samples. Towards this end, we develop a new negative sampling model, Knowledge Graph Policy Network (KGPolicy), which works as a reinforcement learning agent to explore high-quality negatives. Specifically, by conducting our designed exploration operations, it navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender. We tested on a matrix factorization (MF) model equipped with KGPolicy, and it achieves significant improvements over both state-of-the-art sampling methods like DNS and IRGAN, and KG-enhanced recommender models like KGAT. Further analyses from different angles provide insights of knowledge-aware sampling. We release the codes and datasets at //github.com/xiangwang1223/kgpolicy.

北京阿比特科技有限公司