国产精品亚洲综合久久,国精产品W灬源码网站1688,麻豆国产精品无码视频,国产精品免费电影在线播放,黄色网站免费观看网站

Providing reliable connectivity to cellular-connected UAVs can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connectivity of UAVs in such environments, this paper proposes a RL algorithm to dynamically optimise the height of a UAV as it moves through the environment, with the goal of increasing the throughput that it experiences. The proposed solution is evaluated using experimentally-obtained measurements from two different locations in Dublin city centre, Ireland. In the first scenario, the UAV is connected to a macro-cell, while in the second scenario, the UAVs associates to different small cells in a two-tier mobile network. Results show that the proposed solution increases 6 to 41% in throughput, compared to baseline approaches.

相關內容

回合

關注 3

Medium · 機器人 · Networking · 控制器 · 學成 ·

2022 年 1 月 20 日

Physically Embodied Deep Image Optimisation

Daniela Mihai,Jonathon Hare

Physical sketches are created by learning programs to control a drawing robot. A differentiable rasteriser is used to optimise sets of drawing strokes to match an input image, using deep networks to provide an encoding for which we can compute a loss. The optimised drawing primitives can then be translated into G-code commands which command a robot to draw the image using drawing instruments such as pens and pencils on a physical support medium.

可約的 · Prompt · 總體代價 · 控制器 · 學成 ·

2022 年 1 月 19 日

PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network Applications

Drew Penney,Bin Li,Jaroslaw Sydir,Charlie Tai,Eoin Walsh,Thomas Long,Stefan Lee,Lizhong Chen

A growing number of service providers are exploring methods to improve server utilization, reduce power consumption, and reduce total cost of ownership by co-scheduling high-priority latency-critical workloads with best-effort workloads. This practice requires strict resource allocation between workloads to reduce resource contention and maintain Quality of Service (QoS) guarantees. Prior resource allocation works have been shown to improve server utilization under ideal circumstances, yet often compromise QoS guarantees or fail to find valid resource allocations in more dynamic operating environments. Further, prior works are fundamentally reliant upon QoS measurements that can, in practice, exhibit significant transient fluctuations, thus stable control behavior cannot be reliably achieved. In this paper, we propose a novel framework for dynamic resource allocation based on proactive QoS prediction. These predictions help guide a reinforcement-learning-based resource controller towards optimal resource allocations while avoiding transient QoS violations due to fluctuating workload demands. Evaluation shows that the proposed method incurs 4.3x fewer QoS violations, reduces severity of QoS violations by 3.7x, improves best-effort workload performance, and improves overall power efficiency compared with prior work.

Performer · Extensibility · 覆蓋 · state-of-the-art · 查準率/準確率 ·

2022 年 1 月 18 日

Cooperative Multi-UAV Coverage Mission Planning Platform for Remote Sensing Applications

Savvas D. Apostolidis,Pavlos Ch. Kapoutsis,Athanasios Ch. Kapoutsis,Elias B. Kosmatopoulos

from arxiv, An implementation of the mCPP methodology introduced in this work, as well as a link for a demonstrative video and a link for a fully functional, on-line hosted instance of the presented platform can be found here: //github.com/savvas-ap/mCPP-optimized-DARP

This paper proposes a novel mission planning platform, capable of efficiently deploying a team of UAVs to cover complex-shaped areas, in various remote sensing applications. Under the hood lies a novel optimization scheme for grid-based methods, utilizing Simulated Annealing algorithm, that significantly increases the achieved percentage of coverage and improves the qualitative features of the generated paths. Extensive simulated evaluation in comparison with a state-of-the-art alternative methodology, for coverage path planning (CPP) operations, establishes the performance gains in terms of achieved coverage and overall duration of the generated missions. On top of that, DARP algorithm is employed to allocate sub-tasks to each member of the swarm, taking into account each UAV's sensing and operational capabilities, their initial positions and any no-fly-zones possibly defined inside the operational area. This feature is of paramount importance in real-life applications, as it has the potential to achieve tremendous performance improvements in terms of time demanded to complete a mission, while at the same time it unlocks a wide new range of applications, that was previously not feasible due to the limited battery life of UAVs. In order to investigate the actual efficiency gains that are introduced by the multi-UAV utilization, a simulated study is performed as well. All of these capabilities are packed inside an end-to-end platform that eases the utilization of UAVs' swarms in remote sensing applications. Its versatility is demonstrated via two different real-life applications: (i) a photogrametry for precision agriculture and (ii) an indicative search and rescue for first responders missions, that were performed utilizing a swarm of commercial UAVs.

學成 · 狀態空間 · 回合 · 強化學習 · TEAM ·

2022 年 1 月 17 日

Improved Reinforcement Learning in Cooperative Multi-agent Environments Using Knowledge Transfer

Mahnoosh Mahdavimoghaddam,Amin Nikanjam,Monireh Abdoos

from arxiv, Accepted for publication by The Journal of Supercomputing

Nowadays, cooperative multi-agent systems are used to learn how to achieve goals in large-scale dynamic environments. However, learning in these environments is challenging: from the effect of search space size on learning time to inefficient cooperation among agents. Moreover, reinforcement learning algorithms may suffer from a long time of convergence in such environments. In this paper, a communication framework is introduced. In the proposed communication framework, agents learn to cooperate effectively and also by introduction of a new state calculation method the size of state space will decline considerably. Furthermore, a knowledge-transferring algorithm is presented to share the gained experiences among the different agents, and develop an effective knowledge-fusing mechanism to fuse the knowledge learnt utilizing the agents' own experiences with the knowledge received from other team members. Finally, the simulation results are provided to indicate the efficacy of the proposed method in the complex learning task. We have evaluated our approach on the shepherding problem and the results show that the learning process accelerates by making use of the knowledge transferring mechanism and the size of state space has declined by generating similar states based on state abstraction concept.

優化器 · 學成 · Automator · 閾值 · CASES ·

2022 年 1 月 17 日

Intrusion Prevention through Optimal Stopping

Kim Hammar,Rolf Stadler

from arxiv, Preprint; Submitted to IEEE for review. Minor text updates 17/1 2022. arXiv admin note: substantial text overlap with arXiv:2106.07160

We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the problem of intrusion prevention as an (optimal) multiple stopping problem. This formulation gives us insight into the structure of optimal policies, which we show to have threshold properties. For most practical cases, it is not feasible to obtain an optimal defender policy using dynamic programming. We therefore develop a reinforcement learning approach to approximate an optimal policy. Our method for learning and validating policies includes two systems: a simulation system where defender policies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned policies are evaluated. We show that our approach can produce effective defender policies for a practical IT infrastructure of limited size. Inspection of the learned policies confirms that they exhibit threshold properties.

動力學模擬 · 優化器 · 強化學習 · 學成 · COVID-19 ·

2022 年 1 月 17 日

Railway Operation Rescheduling System via Dynamic Simulation and Reinforcement Learning

Shumpei Kubosawa,Takashi Onishi,Makoto Sakahara,Yoshimasa Tsuruoka

from arxiv, English translated version is placed at first and the original Japanese version follows. 4 pages and 5 figures in the original manuscript. Proceedings of the 28th jointed railway technology symposium (J-RAIL 2021)

The number of railway service disruptions has been increasing owing to intensification of natural disasters. In addition, abrupt changes in social situations such as the COVID-19 pandemic require railway companies to modify the traffic schedule frequently. Therefore, automatic support for optimal scheduling is anticipated. In this study, an automatic railway scheduling system is presented. The system leverages reinforcement learning and a dynamic simulator that can simulate the railway traffic and passenger flow of a whole line. The proposed system enables rapid generation of the traffic schedule of a whole line because the optimization process is conducted in advance as the training. The system is evaluated using an interruption scenario, and the results demonstrate that the system can generate optimized schedules of the whole line in a few minutes.

學成 · 均值 · 強化學習 · entity · INTERACT ·

2018 年 6 月 12 日

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang,Rui Luo,Minne Li,Ming Zhou,Weinan Zhang,Jun Wang

from arxiv, ICML 2018 (Full paper + Long talk)

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

Better · 強化學習 · 學成 · Performer · 最優化 ·

2018 年 4 月 24 日

Accelerated Reinforcement Learning

K. Lakshmanan

from arxiv, The proof is not complete as it has to be shown the algorithm tracks the ODE

Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.

Networking · 深度強化學習 · 回聲狀態網絡 · CC · 學成 ·

2018 年 1 月 16 日

Cellular-Connected UAVs over 5G: Deep Reinforcement Learning for Interference Management

Ursula Challita,Walid Saad,Christian Bettstetter

In this paper, an interference-aware path planning scheme for a network of cellular-connected unmanned aerial vehicles (UAVs) is proposed. In particular, each UAV aims at achieving a tradeoff between maximizing energy efficiency and minimizing both wireless latency and the interference level caused on the ground network along its path. The problem is cast as a dynamic game among UAVs. To solve this game, a deep reinforcement learning algorithm, based on echo state network (ESN) cells, is proposed. The introduced deep ESN architecture is trained to allow each UAV to map each observation of the network state to an action, with the goal of minimizing a sequence of time-dependent utility functions. Each UAV uses ESN to learn its optimal path, transmission power level, and cell association vector at different locations along its path. The proposed algorithm is shown to reach a subgame perfect Nash equilibrium (SPNE) upon convergence. Moreover, an upper and lower bound for the altitude of the UAVs is derived thus reducing the computational complexity of the proposed algorithm. Simulation results show that the proposed scheme achieves better wireless latency per UAV and rate per ground user (UE) while requiring a number of steps that is comparable to a heuristic baseline that considers moving via the shortest distance towards the corresponding destinations. The results also show that the optimal altitude of the UAVs varies based on the ground network density and the UE data rate requirements and plays a vital role in minimizing the interference level on the ground UEs as well as the wireless transmission delay of the UAV.

深度Q網絡 · Q網絡` · 學成 · 深度強化學習 · Google DeepMind ·

2015 年 11 月 27 日

Multiagent Cooperation and Competition with Deep Reinforcement Learning

Ardi Tampuu,Tambet Matiisen,Dorian Kodelja,Ilya Kuzovkin,Kristjan Korjus,Juhan Aru,Jaan Aru,Raul Vicente

Multiagent systems appear in most social, economical, and political situations. In the present work we extend the Deep Q-Learning Network architecture proposed by Google DeepMind to multiagent environments and investigate how two agents controlled by independent Deep Q-Networks interact in the classic videogame Pong. By manipulating the classical rewarding scheme of Pong we demonstrate how competitive and collaborative behaviors emerge. Competitive agents learn to play and score efficiently. Agents trained under collaborative rewarding schemes find an optimal strategy to keep the ball in the game as long as possible. We also describe the progression from competitive to collaborative behavior. The present work demonstrates that Deep Q-Networks can become a practical tool for studying the decentralized learning of multiagent systems living in highly complex environments.