In a connected transportation system, adaptive traffic signal controllers (ATSC) utilize real-time vehicle trajectory data received from vehicles through wireless connectivity (i.e., connected vehicles) to regulate green time. However, this wirelessly connected ATSC increases cyber-attack surfaces and increases their vulnerability to various cyber-attack modes, which can be leveraged to induce significant congestion in a roadway network. An attacker may receive financial benefits to create such a congestion for a specific roadway. One such mode is a 'sybil' attack in which an attacker creates fake vehicles in the network by generating fake Basic Safety Messages (BSMs) imitating actual connected vehicles following roadway traffic rules. The ultimate goal of an attacker will be to block a route(s) by generating fake or 'sybil' vehicles at a rate such that the signal timing and phasing changes occur without flagging any abrupt change in number of vehicles. Because of the highly non-linear and unpredictable nature of vehicle arrival rates and the ATSC algorithm, it is difficult to find an optimal rate of sybil vehicles, which will be injected from different approaches of an intersection. Thus, it is necessary to develop an intelligent cyber-attack model to prove the existence of such attacks. In this study, a reinforcement learning based cyber-attack model is developed for a waiting time-based ATSC. Specifically, an RL agent is trained to learn an optimal rate of sybil vehicle injection to create congestion for an approach(s). Our analyses revealed that the RL agent can learn an optimal policy for creating an intelligent attack.
Digital twin (DT) systems aim to create virtual replicas of physical objects that are updated in real time with their physical counterparts and evolve alongside the physical assets throughout its lifecycle. Transportation systems are poised to significantly benefit from this new paradigm. In particular, DT technology can augment the capabilities of intelligent transportation systems. However, the development and deployment of networkwide transportation DT systems need to take into consideration the scale and dynamic nature of future connected and automated transportation systems. Motivated by the need of understanding the requirements and challenges involved in developing and implementing such systems, this paper proposes a hierarchical concept for a Transportation DT (TDT) system starting from individual transportation assets and building up to the entire networkwide TDT. A reference architecture is proposed for TDT systems that could be used as a guide in developing TDT systems at any scale within the presented hierarchical concept. In addition, several use cases are presented based upon the reference architecture which illustrate the utility of a TDT system from transportation safety, mobility and environmental applications perspective. This is followed by a review of current studies in the domain of TDT systems. Finally, the critical challenges and promising future research directions in TDT are discussed to overcome existing barriers to realize a safe and operationally efficient connected and automated transportation systems.
We propose a joint channel estimation and signal detection approach for the uplink non-orthogonal multiple access (NOMA) using unsupervised machine learning. We apply a Gaussian mixture model (GMM) to cluster the received signals, and accordingly optimize the decision regions to enhance the symbol error rate (SER) performance. We show that, when the received powers of the users are sufficiently different, the proposed clustering-based approach achieves an SER performance on a par with that of the conventional maximum-likelihood detector (MLD) with full channel state information (CSI). We study the tradeoff between the accuracy of the proposed approach and the blocklength, as the accuracy of the utilized clustering algorithm depends on the number of symbols available at the receiver. We provide a comprehensive performance analysis of the proposed approach and derive a theoretical bound on its SER performance. Our simulation results corroborate the effectiveness of the proposed approach and verify that the calculated theoretical bound can predict the SER performance of the proposed approach well. We further explore the application of the proposed approach to a practical grant-free NOMA scenario, and show that its performance is very close to that of the optimal MLD with full CSI, which usually requires long pilot sequences.
Radio access network (RAN) slicing is a key element in enabling current 5G networks and next-generation networks to meet the requirements of different services in various verticals. However, the heterogeneous nature of these services' requirements, along with the limited RAN resources, makes the RAN slicing very complex. Indeed, the challenge that mobile virtual network operators (MVNOs) face is to rapidly adapt their RAN slicing strategies to the frequent changes of the environment constraints and service requirements. Machine learning techniques, such as deep reinforcement learning (DRL), are increasingly considered a key enabler for automating the management and orchestration of RAN slicing operations. Nerveless, the ability to generalize DRL models to multiple RAN slicing environments may be limited, due to their strong dependence on the environment data on which they are trained. Federated learning enables MVNOs to leverage more diverse training inputs for DRL without the high cost of collecting this data from different RANs. In this paper, we propose a federated deep reinforcement learning approach for RAN slicing. In this approach, MVNOs collaborate to improve the performance of their DRL-based RAN slicing models. Each MVNO trains a DRL model and sends it for aggregation. The aggregated model is then sent back to each MVNO for immediate use and further training. The simulation results show the effectiveness of the proposed DRL approach.
Deep Neural Networks (DNNs) have been widely used to perform real-world tasks in cyber-physical systems such as Autonomous Driving Systems (ADS). Ensuring the correct behavior of such DNN-Enabled Systems (DES) is a crucial topic. Online testing is one of the promising modes for testing such systems with their application environments (simulated or real) in a closed loop taking into account the continuous interaction between the systems and their environments. However, the environmental variables (e.g., lighting conditions) that might change during the systems' operation in the real world, causing the DES to violate requirements (safety, functional), are often kept constant during the execution of an online test scenario due to the two major challenges: (1) the space of all possible scenarios to explore would become even larger if they changed and (2) there are typically many requirements to test simultaneously. In this paper, we present MORLOT (Many-Objective Reinforcement Learning for Online Testing), a novel online testing approach to address these challenges by combining Reinforcement Learning (RL) and many-objective search. MORLOT leverages RL to incrementally generate sequences of environmental changes while relying on many-objective search to determine the changes so that they are more likely to achieve any of the uncovered objectives. We empirically evaluate MORLOT using CARLA, a high-fidelity simulator widely used for autonomous driving research, integrated with Transfuser, a DNN-enabled ADS for end-to-end driving. The evaluation results show that MORLOT is significantly more effective and efficient than alternatives with a large effect size. In other words, MORLOT is a good option to test DES with dynamically changing environments while accounting for multiple safety requirements.
In the context of an efficient network traffic engineering process where the network continuously measures a new traffic matrix and updates the set of paths in the network, an automated process is required to quickly and efficiently identify when and what set of paths should be used. Unfortunately, the burden of finding the optimal solution for the network updating process in each given time interval is high since the computation complexity of optimization approaches using linear programming increases significantly as the size of the network increases. In this paper, we use deep reinforcement learning to derive a data-driven algorithm that does the path selection in the network considering the overhead of route computation and path updates. Our proposed scheme leverages information about past network behavior to identify a set of robust paths to be used for multiple future time intervals to avoid the overhead of updating the forwarding behavior of routers frequently. We compare the results of our approach to other traffic engineering solutions through extensive simulations across real network topologies. Our results demonstrate that our scheme fares well by a factor of 40% with respect to reducing link utilization compared to traditional TE schemes such as ECMP. Our scheme provides a slightly higher link utilization (around 25%) compared to schemes that only minimize link utilization and do not care about path updating overhead.
2048 is a single-player stochastic puzzle game. This intriguing and addictive game has been popular worldwide and has attracted researchers to develop game-playing programs. Due to its simplicity and complexity, 2048 has become an interesting and challenging platform for evaluating the effectiveness of machine learning methods. This dissertation conducts comprehensive research on reinforcement learning and computer game algorithms for 2048. First, this dissertation proposes optimistic temporal difference learning, which significantly improves the quality of learning by employing optimistic initialization to encourage exploration for 2048. Furthermore, based on this approach, a state-of-the-art program for 2048 is developed, which achieves the highest performance among all learning-based programs, namely an average score of 625377 points and a rate of 72% for reaching 32768-tiles. Second, this dissertation investigates several techniques related to 2048, including the n-tuple network ensemble learning, Monte Carlo tree search, and deep reinforcement learning. These techniques are promising for further improving the performance of the current state-of-the-art program. Finally, this dissertation discusses pedagogical applications related to 2048 by proposing course designs and summarizing the teaching experience. The proposed course designs use 2048-like games as materials for beginners to learn reinforcement learning and computer game algorithms. The courses have been successfully applied to graduate-level students and received well by student feedback.
Variational inference (VI) is a specific type of approximate Bayesian inference that approximates an intractable posterior distribution with a tractable one. VI casts the inference problem as an optimization problem, more specifically, the goal is to maximize a lower bound of the logarithm of the marginal likelihood with respect to the parameters of the approximate posterior. Reinforcement learning (RL) on the other hand deals with autonomous agents and how to make them act optimally such as to maximize some notion of expected future cumulative reward. In the non-sequential setting where agents' actions do not have an impact on future states of the environment, RL is covered by contextual bandits and Bayesian optimization. In a proper sequential scenario, however, where agents' actions affect future states, instantaneous rewards need to be carefully traded off against potential long-term rewards. This manuscript shows how the apparently different subjects of VI and RL are linked in two fundamental ways. First, the optimization objective of RL to maximize future cumulative rewards can be recovered via a VI objective under a soft policy constraint in both the non-sequential and the sequential setting. This policy constraint is not just merely artificial but has proven as a useful regularizer in many RL tasks yielding significant improvements in agent performance. And second, in model-based RL where agents aim to learn about the environment they are operating in, the model-learning part can be naturally phrased as an inference problem over the process that governs environment dynamics. We are going to distinguish between two scenarios for the latter: VI when environment states are fully observable by the agent and VI when they are only partially observable through an observation distribution.
With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.