18GAY国产小鲜肉可播放,成人免费午夜剧场,国产裸交一区二区三区视频,98精品视频在线播放

Recently, there has been tremendous efforts by network operators and equipment vendors to adopt intelligence and openness in the next generation radio access network (RAN). The goal is to reach a RAN that can self-optimize in a highly complex setting with multiple platforms, technologies and vendors in a converged compute and connect architecture. In this paper, we propose two nested actor-critic learning based techniques to optimize the placement of resource allocation function, and as well, the decisions for resource allocation. By this, we investigate the impact of observability on the performance of the reinforcement learning based resource allocation. We show that when a network function (NF) is dynamically relocated based on service requirements, using reinforcement learning techniques, latency and throughput gains are obtained.

相關內容

學成

關注 0

優化器 · Performer · Microsoft Surface · Next · 向量化 ·

2021 年 11 月 23 日

Resource Allocation for Active IRS-Assisted Multiuser Communication Systems

Dongfang Xu,Xianghao Yu,Derrick Wing Kwan Ng,Robert Schober

from arxiv, 3 figures, submitted to Asilomar 2021

Intelligent reflecting surfaces (IRSs) are emerging as promising enablers for the next generation of wireless communication systems, because of their ability to customize favorable radio propagation environments. However, with the conventional passive architecture, IRSs can only adjust the phase of the incident signals limiting the achievable beamforming gain. To fully unleash the potential of IRSs, in this paper, we consider a more general IRS architecture, i.e., active IRSs, which can adapt the phase and amplify the magnitude of the reflected incident signal simultaneously with the support of an additional power source. To realize green communication in active IRS-assisted multiuser systems, we jointly optimize the reflection matrix at the IRS and the beamforming vector at the base station (BS) for the minimization of the BS transmit power. The resource allocation algorithm design is formulated as an optimization problem taking into account the maximum power budget of the active IRS and the quality-of-service (QoS) requirements of the users. To handle the non-convex design problem, we develop a novel and computationally efficient algorithm based on the bilinear transformation and inner approximation methods. The proposed algorithm is guaranteed to converge to a locally optimal solution of the considered problem. Simulation results illustrate the effectiveness of the proposed scheme compared to the two baseline schemes. Moreover, the results unveil that deploying active IRSs is a promising approach to enhance the system performance compared to conventional passive IRSs, especially when strong direct links exist.

Storage · 優化器 · Machine Learning · 學成 · ML ·

2021 年 11 月 22 日

KML: Using Machine Learning to Improve Storage Systems

Ibrahim Umit Akgun,Ali Selman Aydin,Aadil Shaikh,Lukas Velikov,Andrew Burford,Michael McNeill,Michael Arkhangelskiy,Erez Zadok

from arxiv, 16 pages, 11 figures

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users -- essentially burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O heavy applications, so even a small overall latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this paper, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two problems: optimal readahead and NFS read-size values. Our experiments show that KML consumes little OS resources, adds negligible latency, and yet can learn patterns that can improve I/O throughput by as much as 2.3x or 15x for the two use cases respectively -- even for complex, never-before-seen, concurrently running mixed workloads on different storage devices.

泛函 · 約束 · 強化學習 · Q函數 · 學成 ·

2021 年 6 月 24 日

Density Constrained Reinforcement Learning

Zengyi Qin,Yuxiao Chen,Chuchu Fan

from arxiv, Accepted by ICML, 2021

We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.

獎勵函數 · 超參數 · 泛函 · 學成 · Extensibility ·

2021 年 5 月 25 日

Hyperparameter Selection for Imitation Learning

Leonard Hussenot,Marcin Andrychowicz,Damien Vincent,Robert Dadashi,Anton Raichuk,Lukasz Stafiniak,Sertan Girgin,Raphael Marinier,Nikola Momchev,Sabela Ramos,Manu Orsini,Olivier Bachem,Matthieu Geist,Olivier Pietquin

from arxiv, ICML 2021

We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward function be available, it could then directly be used for policy training and imitation would not be necessary. To tackle this mostly ignored problem, we propose a number of possible proxies to the external reward. We evaluate them in an extensive empirical study (more than 10'000 agents across 9 environments) and make practical recommendations for selecting HPs. Our results show that while imitation learning algorithms are sensitive to HP choices, it is often possible to select good enough HPs through a proxy to the reward function.

遷移學習 · 學成 · state-of-the-art · Boosting（一種模型訓練加速方式） · FAST ·

2020 年 9 月 16 日

Transfer Learning in Deep Reinforcement Learning: A Survey

Zhuangdi Zhu,Kaixiang Lin,Jiayu Zhou

This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.

機器人操作平臺 · 強化學習 · 學成 · CASES · 機器人 ·

2019 年 3 月 14 日

gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo

Nestor Gonzalez Lopez,Yue Leire Erro Nuin,Elias Barba Moral,Lander Usategui San Juan,Alejandro Solano Rueda,Víctor Mayoral Vilches,Risto Kojcev

This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.

tuning · 學成 · 深度強化學習 · 超參數 · Performer ·

2018 年 12 月 26 日

Learning to Walk via Deep Reinforcement Learning

Tuomas Haarnoja,Aurick Zhou,Sehoon Ha,Jie Tan,George Tucker,Sergey Levine

from arxiv, Videos: //sites.google.com/view/minitaur-locomotion/ . arXiv admin note: substantial text overlap with arXiv:1812.05905

Deep reinforcement learning suggests the promise of fully automated learning of robotic control policies that directly map sensory inputs to low-level actions. However, applying deep reinforcement learning methods on real-world robots is exceptionally difficult, due both to the sample complexity and, just as importantly, the sensitivity of such methods to hyperparameters. While hyperparameter tuning can be performed in parallel in simulated domains, it is usually impractical to tune hyperparameters directly on real-world robotic platforms, especially legged platforms like quadrupedal robots that can be damaged through extensive trial-and-error learning. In this paper, we develop a stable variant of the soft actor-critic deep reinforcement learning algorithm that requires minimal hyperparameter tuning, while also requiring only a modest number of trials to learn multilayer neural network policies. This algorithm is based on the framework of maximum entropy reinforcement learning, and automatically trades off exploration against exploitation by dynamically and automatically tuning a temperature parameter that determines the stochasticity of the policy. We show that this method achieves state-of-the-art performance on four standard benchmark environments. We then demonstrate that it can be used to learn quadrupedal locomotion gaits on a real-world Minitaur robot, learning to walk from scratch directly in the real world in two hours of training.

強化學習 · 學成 · Wireless Networks · Performer · MoDELS ·

2018 年 3 月 30 日

Cache-Enabled Dynamic Rate Allocation via Deep Self-Transfer Reinforcement Learning

Zhengming Zhang,Yaru Zheng,Meng Hua,Yongming Huang,Luxi Yang

Caching and rate allocation are two promising approaches to support video streaming over wireless network. However, existing rate allocation designs do not fully exploit the advantages of the two approaches. This paper investigates the problem of cache-enabled QoE-driven video rate allocation problem. We establish a mathematical model for this problem, and point out that it is difficult to solve the problem with traditional dynamic programming. Then we propose a deep reinforcement learning approaches to solve it. First, we model the problem as a Markov decision problem. Then we present a deep Q-learning algorithm with a special knowledge transfer process to find out effective allocation policy. Finally, numerical results are given to demonstrate that the proposed solution can effectively maintain high-quality user experience of mobile user moving among small cells. We also investigate the impact of configuration of critical parameters on the performance of our algorithm.

隱狀態 · 學成 · 強化學習 · INFORMS · 不完美信息 ·

2018 年 3 月 22 日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Roberta Raileanu,Emily Denton,Arthur Szlam,Rob Fergus

from arxiv, 10 pages, 16 figures, submitted to ICML 2018

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players' hidden states, in both cooperative and adversarial settings.

深度強化學習 · 學成 · 強化學習 · tuning · CASE ·

2018 年 1 月 17 日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Ankur Sharma,Felix Martin Schuhknecht,Jens Dittrich

Like any large software system, a full-fledged DBMS offers an overwhelming amount of configuration knobs. These range from static initialisation parameters like buffer sizes, degree of concurrency, or level of replication to complex runtime decisions like creating a secondary index on a particular column or reorganising the physical layout of the store. To simplify the configuration, industry grade DBMSs are usually shipped with various advisory tools, that provide recommendations for given workloads and machines. However, reality shows that the actual configuration, tuning, and maintenance is usually still done by a human administrator, relying on intuition and experience. Recent work on deep reinforcement learning has shown very promising results in solving problems, that require such a sense of intuition. For instance, it has been applied very successfully in learning how to play complicated games with enormous search spaces. Motivated by these achievements, in this work we explore how deep reinforcement learning can be used to administer a DBMS. First, we will describe how deep reinforcement learning can be used to automatically tune an arbitrary software system like a DBMS by defining a problem environment. Second, we showcase our concept of NoDBA at the concrete example of index selection and evaluate how well it recommends indexes for given workloads.