The capacity sharing problem in Radio Access Network (RAN) slicing deals with the distribution of the capacity available in each RAN node among various RAN slices to satisfy their traffic demands and efficiently use the radio resources. While several capacity sharing algorithmic solutions have been proposed in the literature, their practical implementation still remains as a gap. In this paper, the implementation of a Reinforcement Learning-based capacity sharing algorithm over the O-RAN architecture is discussed, providing insights into the operation of the involved interfaces and the containerization of the solution. Moreover, the description of the testbed implemented to validate the solution is included and some performance and validation results are presented.
Many challenging real-world problems require the deployment of ensembles multiple complementary learning models to reach acceptable performance levels. While effective, applying the entire ensemble to every sample is costly and often unnecessary. Deep Reinforcement Learning (DRL) offers a cost-effective alternative, where detectors are dynamically chosen based on the output of their predecessors, with their usefulness weighted against their computational cost. Despite their potential, DRL-based solutions are not widely used in this capacity, partly due to the difficulties in configuring the reward function for each new task, the unpredictable reactions of the DRL agent to changes in the data, and the inability to use common performance metrics (e.g., TPR/FPR) to guide the algorithm's performance. In this study we propose methods for fine-tuning and calibrating DRL-based policies so that they can meet multiple performance goals. Moreover, we present a method for transferring effective security policies from one dataset to another. Finally, we demonstrate that our approach is highly robust against adversarial attacks.
In narrow spaces, motion planning based on the traditional hierarchical autonomous system could cause collisions due to mapping, localization, and control noises. Additionally, it is disabled when mapless. To tackle these problems, we leverage deep reinforcement learning which is verified to be effective in self-decision-making, to self-explore in narrow spaces without a map while avoiding collisions. Specifically, based on our Ackermann-steering rectangular-shaped ZebraT robot and its Gazebo simulator, we propose the rectangular safety region to represent states and detect collisions for rectangular-shaped robots, and a carefully crafted reward function for reinforcement learning that does not require the destination information. Then we benchmark five reinforcement learning algorithms including DDPG, DQN, SAC, PPO, and PPO-discrete, in a simulated narrow track. After training, the well-performed DDPG and DQN models can be transferred to three brand new simulated tracks, and furthermore to three real-world tracks.
Edge Computing is a promising technology to provide new capabilities in technological fields that require instantaneous data processing. Researchers in areas such as machine and deep learning use extensively edge and cloud computing for their applications, mainly due to the significant computational and storage resources that they provide. Currently, Robotics is seeking to take advantage of these capabilities as well, and with the development of 5G networks, some existing limitations in the field can be overcome. In this context, it is important to know how to utilize the emerging edge architectures, what types of edge architectures and platforms exist today and which of them can and should be used based on each robotic application. In general, Edge platforms can be implemented and used differently, especially since there are several providers offering more or less the same set of services with some essential differences. Thus, this study addresses these discussions for those who work in the development of the next generation robotic systems and will help to understand the advantages and disadvantages of each edge computing architecture in order to choose wisely the right one for each application.
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $\mathcal{S}$ and action space $\mathcal{A}$. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$. The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the order of $\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}$ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs, and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).
The fifth and sixth generations of wireless communication networks are enabling tools such as internet of things devices, unmanned aerial vehicles (UAVs), and artificial intelligence, to improve the agricultural landscape using a network of devices to automatically monitor farmlands. Surveying a large area requires performing a lot of image classification tasks within a specific period of time in order to prevent damage to the farm in case of an incident, such as fire or flood. UAVs have limited energy and computing power, and may not be able to perform all of the intense image classification tasks locally and within an appropriate amount of time. Hence, it is assumed that the UAVs are able to partially offload their workload to nearby multi-access edge computing devices. The UAVs need a decision-making algorithm that will decide where the tasks will be performed, while also considering the time constraints and energy level of the other UAVs in the network. In this paper, we introduce a Deep Q-Learning (DQL) approach to solve this multi-objective problem. The proposed method is compared with Q-Learning and three heuristic baselines, and the simulation results show that our proposed DQL-based method achieves comparable results when it comes to the UAVs' remaining battery levels and percentage of deadline violations. In addition, our method is able to reach convergence 13 times faster than Q-Learning.
Graph mining tasks arise from many different application domains, ranging from social networks, transportation, E-commerce, etc., which have been receiving great attention from the theoretical and algorithm design communities in recent years, and there has been some pioneering work using the hotly researched reinforcement learning (RL) techniques to address graph data mining tasks. However, these graph mining algorithms and RL models are dispersed in different research areas, which makes it hard to compare different algorithms with each other. In this survey, we provide a comprehensive overview of RL models and graph mining and generalize these algorithms to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method description, open-source codes, and benchmark datasets of GRL methods. Finally, we propose possible important directions and challenges to be solved in the future. This is the latest work on a comprehensive survey of GRL literature, and this work provides a global view for researchers as well as a learning resource for researchers outside the domain. In addition, we create an online open-source for both interested researchers who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.