亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Multi-agent formation as well as obstacle avoidance is one of the most actively studied topics in the field of multi-agent systems. Although some classic controllers like model predictive control (MPC) and fuzzy control achieve a certain measure of success, most of them require precise global information which is not accessible in harsh environments. On the other hand, some reinforcement learning (RL) based approaches adopt the leader-follower structure to organize different agents' behaviors, which sacrifices the collaboration between agents thus suffering from bottlenecks in maneuverability and robustness. In this paper, we propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL). Agents in our system only utilize local and relative information to make decisions and control themselves distributively. Agent in the multi-agent system will reorganize themselves into a new topology quickly in case that any of them is disconnected. Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines (both classic control methods and another RL-based method). The feasibility of our method is verified by both simulation and hardware implementation with Ackermann-steering vehicles.

相關內容

《計算機信息》雜志發表高質量的論文,擴大了運籌學和計算的范圍,尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文,以及描述新的和有用的軟件工具的論文。官網鏈接: · Networking · 深度強化學習 · 神經網絡 · 強化學習 ·
2022 年 1 月 18 日

Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to occasionally take random actions and update the value function, i.e., re-train the Artificial Neural Network (ANN), to ensure its performance remains optimal. Unfortunately, embedded devices often lack processing power and energy required to train the ANN. The energy aspect is particularly challenging when the edge device is powered only by a means of Energy Harvesting (EH). To overcome this problem, we propose a two-part algorithm in which the DRL process is trained at the sink. Then the weights of the fully trained underlying ANN are periodically transferred to the EH-powered embedded device taking actions. Using an EH-powered sensor, real-world measurements dataset, and optimizing for Age of Information (AoI) metric, we demonstrate that such a DRL solution can operate without any degradation in the performance, with only a few ANN updates per day.

Traditionally, the performance of multi-agent deep reinforcement learning algorithms are demonstrated and validated in gaming environments where we often have a fixed number of agents. In many industrial applications, the number of available agents can change at any given day and even when the number of agents is known ahead of time, it is common for an agent to break during the operation and become unavailable for a period of time. In this paper, we propose a new deep reinforcement learning algorithm for multi-agent collaborative tasks with a variable number of agents. We demonstrate the application of our algorithm using a fleet management simulator developed by Hitachi to generate realistic scenarios in a production site.

Robust model predictive control (MPC) is a well-known control technique for model-based control with constraints and uncertainties. In classic robust tube-based MPC approaches, an open-loop control sequence is computed via periodically solving an online nominal MPC problem, which requires prior model information and frequent access to onboard computational resources. In this paper, we propose an efficient robust MPC solution based on receding horizon reinforcement learning, called r-LPC, for unknown nonlinear systems with state constraints and disturbances. The proposed r-LPC utilizes a Koopman operator-based prediction model obtained off-line from pre-collected input-output datasets. Unlike classic tube-based MPC, in each prediction time interval of r-LPC, we use an actor-critic structure to learn a near-optimal feedback control policy rather than a control sequence. The resulting closed-loop control policy can be learned off-line and deployed online or learned online in an asynchronous way. In the latter case, online learning can be activated whenever necessary; for instance, the safety constraint is violated with the deployed policy. The closed-loop recursive feasibility, robustness, and asymptotic stability are proven under function approximation errors of the actor-critic networks. Simulation and experimental results on two nonlinear systems with unknown dynamics and disturbances have demonstrated that our approach has better or comparable performance when compared with tube-based MPC and LQR, and outperforms a recently developed actor-critic learning approach.

Federated learning (FL) offers a decentralized learning environment so that a group of clients can collaborate to train a global model at the server, while keeping their training data confidential. This paper studies how to launch over-the-air jamming attacks to disrupt the FL process when it is executed over a wireless network. As a wireless example, FL is applied to learn how to classify wireless signals collected by clients (spectrum sensors) at different locations (such as in cooperative sensing). An adversary can jam the transmissions for the local model updates from clients to the server (uplink attack), or the transmissions for the global model updates the server to clients (downlink attack), or both. Given a budget imposed on the number of clients that can be attacked per FL round, clients for the (uplink/downlink) attack are selected according to their local model accuracies that would be expected without an attack or ranked via spectrum observations. This novel attack is extended to general settings by accounting different processing speeds and attack success probabilities for clients. Compared to benchmark attack schemes, this attack approach degrades the FL performance significantly, thereby revealing new vulnerabilities of FL to jamming attacks in wireless networks.

This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.

Deep reinforcement learning suggests the promise of fully automated learning of robotic control policies that directly map sensory inputs to low-level actions. However, applying deep reinforcement learning methods on real-world robots is exceptionally difficult, due both to the sample complexity and, just as importantly, the sensitivity of such methods to hyperparameters. While hyperparameter tuning can be performed in parallel in simulated domains, it is usually impractical to tune hyperparameters directly on real-world robotic platforms, especially legged platforms like quadrupedal robots that can be damaged through extensive trial-and-error learning. In this paper, we develop a stable variant of the soft actor-critic deep reinforcement learning algorithm that requires minimal hyperparameter tuning, while also requiring only a modest number of trials to learn multilayer neural network policies. This algorithm is based on the framework of maximum entropy reinforcement learning, and automatically trades off exploration against exploitation by dynamically and automatically tuning a temperature parameter that determines the stochasticity of the policy. We show that this method achieves state-of-the-art performance on four standard benchmark environments. We then demonstrate that it can be used to learn quadrupedal locomotion gaits on a real-world Minitaur robot, learning to walk from scratch directly in the real world in two hours of training.

Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations can cause proficient but narrowly-learned policies to fail at test time. In this work, we propose to learn how to quickly and effectively adapt online to new situations as well as to perturbations. To enable sample-efficient meta-learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach trains a global model such that, when combined with recent data, the model can be be rapidly adapted to the local context. Our experiments demonstrate that our approach can enable simulated agents to adapt their behavior online to novel terrains, to a crippled leg, and in highly-dynamic environments.

北京阿比特科技有限公司