The ability to direct a Probabilistic Boolean Network (PBN) to a desired state is important to applications such as targeted therapeutics in cancer biology. Reinforcement Learning (RL) has been proposed as a framework that solves a discrete-time optimal control problem cast as a Markov Decision Process. We focus on an integrative framework powered by a model-free deep RL method that can address different flavours of the control problem (e.g., with or without control inputs; attractor state or a subset of the state space as the target domain). The method is agnostic to the distribution of probabilities for the next state, hence it does not use the probability transition matrix. The time complexity is linear on the time steps, or interactions between the agent (deep RL) and the environment (PBN), during training. Indeed, we explore the scalability of the deep RL approach to (set) stabilization of large-scale PBNs and demonstrate successful control on large networks, including a metastatic melanoma PBN with 200 nodes.
Reinforcement Learning (RL) can solve complex tasks but does not intrinsically provide any guarantees on system behavior. For real-world systems that fulfill safety-critical tasks, such guarantees on safety specifications are necessary. To bridge this gap, we propose a verifiably safe RL procedure with probabilistic guarantees. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification, while randomizing the controller's inputs within a bounded set. Then, we use RL to improve the performance of this probabilistically verified, i.e. safe, controller and explore in the same bounded set around the controller's input as was randomized over in the verification step. Finally, we calculate probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficient for continuous action and state spaces and separates safety verification and performance improvement into two independent steps. We evaluate our approach on a safe evasion task where a robot has to evade a dynamic obstacle in a specific manner while trying to reach a goal. The results show that our verifiably safe RL approach leads to efficient learning and performance improvements while maintaining safety specifications.
Reinforcement learning allows machines to learn from their own experience. Nowadays, it is used in safety-critical applications, such as autonomous driving, despite being vulnerable to attacks carefully crafted to either prevent that the reinforcement learning algorithm learns an effective and reliable policy, or to induce the trained agent to make a wrong decision. The literature about the security of reinforcement learning is rapidly growing, and some surveys have been proposed to shed light on this field. However, their categorizations are insufficient for choosing an appropriate defense given the kind of system at hand. In our survey, we do not only overcome this limitation by considering a different perspective, but we also discuss the applicability of state-of-the-art attacks and defenses when reinforcement learning algorithms are used in the context of autonomous driving.
We apply reinforcement learning (RL) to robotics. One of the drawbacks of traditional RL algorithms has been their poor sample efficiency. One approach to improve the sample efficiency is model-based RL. In our model-based RL algorithm, we learn a model of the environment, use it to generate imaginary trajectories and backpropagate through them to update the policy, exploiting the differentiability of the model. Intuitively, learning more accurate models should lead to better performance. Recently, there has been growing interest in developing better deep neural network based dynamics models for physical systems, through better inductive biases. We focus on robotic systems undergoing rigid body motion. We compare two versions of our model-based RL algorithm, one which uses a standard deep neural network based dynamics model and the other which uses a much more accurate, physics-informed neural network based dynamics model. We show that, in model-based RL, model accuracy mainly matters in environments that are sensitive to initial conditions. In these environments, the physics-informed version of our algorithm achieves significantly better average-return and sample efficiency. In environments that are not sensitive to initial conditions, both versions of our algorithm achieve similar average-return, while the physics-informed version achieves better sample efficiency. We measure the sensitivity to initial conditions using the finite-time maximal Lyapunov exponent. We also show that, in challenging environments, where we need a lot of samples to learn, physics-informed model-based RL can achieve better average-return than state-of-the-art model-free RL algorithms such as Soft Actor-Critic, by generating accurate imaginary data.
Online ride-hailing services have become a prevalent transportation system across the world. In this paper, we study a challenging problem of how to direct vacant taxis around a city such that supplies and demands can be balanced in online ride-hailing services. We design a new reward scheme that considers multiple performance metrics of online ride-hailing services. We also propose a novel deep reinforcement learning method named Deep-Q-Network with Action Mask (AM-DQN) masking off unnecessary actions in various locations such that agents can learn much faster and more efficiently. We conduct extensive experiments using a city-scale dataset from Chicago. Several popular heuristic and learning methods are also implemented as baselines for comparison. The results of the experiments show that the AM-DQN attains the best performances of all methods with respect to average failure rate, average waiting time for customers, and average idle search time for vacant taxis.
Vehicle routing problems and other combinatorial optimization problems have been approximately solved by reinforcement learning agents with policies based on encoder-decoder models with attention mechanisms. These techniques are of substantial interest but still cannot solve the complex routing problems that arise in a realistic setting which can have many trucks and complex requirements. With the aim of making reinforcement learning a viable technique for supply chain optimization, we develop new extensions to encoder-decoder models for vehicle routing that allow for complex supply chains using classical computing today and quantum computing in the future. We make two major generalizations. First, our model allows for routing problems with multiple trucks. Second, we move away from the simple requirement of having a truck deliver items from nodes to one special depot node, and instead allow for a complex tensor demand structure. We show how our model, even if trained only for a small number of trucks, can be embedded into a large supply chain to yield viable solutions.
Ransomware has emerged as one of the major global threats in recent days. The alarming increasing rate of ransomware attacks and new ransomware variants intrigue the researchers in this domain to constantly examine the distinguishing traits of ransomware and refine their detection or classification strategies. Among the broad range of different behavioral characteristics, the trait of Application Programming Interface (API) calls and network behaviors have been widely utilized as differentiating factors for ransomware detection, or classification. Although many of the prior approaches have shown promising results in detecting and classifying ransomware families utilizing these features without applying any feature selection techniques, feature selection, however, is one of the potential steps toward an efficient detection or classification Machine Learning model because it reduces the probability of overfitting by removing redundant data, improves the model's accuracy by eliminating irrelevant features, and therefore reduces training time. There have been a good number of feature selection techniques to date that are being used in different security scenarios to optimize the performance of the Machine Learning models. Hence, the aim of this study is to present the comparative performance analysis of widely utilized Supervised Machine Learning models with and without RFECV feature selection technique towards ransomware classification utilizing the API call and network traffic features. Thereby, this study provides insight into the efficiency of the RFECV feature selection technique in the case of ransomware classification which can be used by peers as a reference for future work in choosing the feature selection technique in this domain.
In order to gain access to networks, different types of intrusion attacks have been designed, and the attackers are working on improving them. Computer networks have become increasingly important in daily life due to the increasing reliance on them. In light of this, it is quite evident that algorithms with high detection accuracy and reliability are needed for various types of attacks. The purpose of this paper is to develop an intrusion detection system that is based on deep reinforcement learning. Based on the Markov decision process, the proposed system can generate informative representations suitable for classification tasks based on vast data. Reinforcement learning is considered from two different perspectives, deep Q learning, and double deep Q learning. Different experiments have demonstrated that the proposed systems have an accuracy of $99.17\%$ over the UNSW-NB15 dataset in both approaches, an improvement over previous methods based on contrastive learning and LSTM-Autoencoders. The performance of the model trained on UNSW-NB15 has also been evaluated on BoT-IoT datasets, resulting in competitive performance.
Searching for a path between two nodes in a graph is one of the most well-studied and fundamental problems in computer science. In numerous domains such as robotics, AI, or biology, practitioners develop search heuristics to accelerate their pathfinding algorithms. However, it is a laborious and complex process to hand-design heuristics based on the problem and the structure of a given use case. Here we present PHIL (Path Heuristic with Imitation Learning), a novel neural architecture and a training algorithm for discovering graph search and navigation heuristics from data by leveraging recent advances in imitation learning and graph representation learning. At training time, we aggregate datasets of search trajectories and ground-truth shortest path distances, which we use to train a specialized graph neural network-based heuristic function using backpropagation through steps of the pathfinding process. Our heuristic function learns graph embeddings useful for inferring node distances, runs in constant time independent of graph sizes, and can be easily incorporated in an algorithm such as A* at test time. Experiments show that PHIL reduces the number of explored nodes compared to state-of-the-art methods on benchmark datasets by 58.5\% on average, can be directly applied in diverse graphs ranging from biological networks to road networks, and allows for fast planning in time-critical robotics domains.
In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.