We introduce ApolloRL, an open platform for research in reinforcement learning for autonomous driving. The platform provides a complete closed-loop pipeline with training, simulation, and evaluation components. It comes with 300 hours of real-world data in driving scenarios and popular baselines such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) agents. We elaborate in this paper on the architecture and the environment defined in the platform. In addition, we discuss the performance of the baseline agents in the ApolloRL environment.
In this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on $Q$-learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler's fallacy; the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented $Q$-learning algorithm and on a different naive method based on deep reinforcement learning.
Autonomous driving is an active research topic in both academia and industry. However, most of the existing solutions focus on improving the accuracy by training learnable models with centralized large-scale data. Therefore, these methods do not take into account the user's privacy. In this paper, we present a new approach to learn autonomous driving policy while respecting privacy concerns. We propose a peer-to-peer Deep Federated Learning (DFL) approach to train deep architectures in a fully decentralized manner and remove the need for central orchestration. We design a new Federated Autonomous Driving network (FADNet) that can improve the model stability, ensure convergence, and handle imbalanced data distribution problems while is being trained with federated learning methods. Intensively experimental results on three datasets show that our approach with FADNet and DFL achieves superior accuracy compared with other recent methods. Furthermore, our approach can maintain privacy by not collecting user data to a central server.
Driving safely requires multiple capabilities from human and intelligent agents, such as the generalizability to unseen environments, the safety awareness of the surrounding traffic, and the decision-making in complex multi-agent settings. Despite the great success of Reinforcement Learning (RL), most of the RL research works investigate each capability separately due to the lack of integrated environments. In this work, we develop a new driving simulation platform called MetaDrive to support the research of generalizable reinforcement learning algorithms for machine autonomy. MetaDrive is highly compositional, which can generate an infinite number of diverse driving scenarios from both the procedural generation and the real data importing. Based on MetaDrive, we construct a variety of RL tasks and baselines in both single-agent and multi-agent settings, including benchmarking generalizability across unseen scenes, safe exploration, and learning multi-agent traffic. The generalization experiments conducted on both procedurally generated scenarios and real-world scenarios show that increasing the diversity and the size of the training set leads to the improvement of the generalizability of the RL agents. We further evaluate various safe reinforcement learning and multi-agent reinforcement learning algorithms in MetaDrive environments and provide the benchmarks. Source code, documentation, and demo video are available at //metadriverse.github.io/metadrive . More research projects based on MetaDrive simulator are listed at //metadriverse.github.io
The past few years have witnessed an increasing interest in improving the perception performance of LiDARs on autonomous vehicles. While most of the existing works focus on developing new deep learning algorithms or model architectures, we study the problem from the physical design perspective, i.e., how different placements of multiple LiDARs influence the learning-based perception. To this end, we introduce an easy-to-compute information-theoretic surrogate metric to quantitatively and fast evaluate LiDAR placement for 3D detection of different types of objects. We also present a new data collection, detection model training and evaluation framework in the realistic CARLA simulator to evaluate disparate multi-LiDAR configurations. Using several prevalent placements inspired by the designs of self-driving companies, we show the correlation between our surrogate metric and object detection performance of different representative algorithms on KITTI through extensive experiments, validating the effectiveness of our LiDAR placement evaluation approach. Our results show that sensor placement is non-negligible in 3D point cloud-based object detection, which will contribute up to 10% performance discrepancy in terms of average precision in challenging 3D object detection settings. We believe that this is one of the first studies to quantitatively investigate the influence of LiDAR placement on perception performance.
There has been growing interest in the development and deployment of autonomous vehicles on roads over the last few years, encouraged by the empirical successes of powerful artificial intelligence techniques (AI), especially in the applications of deep learning and reinforcement learning. However, as demonstrated by recent traffic accidents, autonomous driving technology is not mature for safe deployment. As AI is the main technology behind the intelligent navigation systems of self-driving vehicles, both the stakeholders and transportation jurisdictions require their AI-driven software architecture to be safe, explainable, and regulatory compliant. We propose a framework that integrates autonomous control, explainable AI, and regulatory compliance to address this issue and validate the framework with a critical analysis in a case study. Moreover, we describe relevant XAI approaches that can help achieve the goals of the framework.
Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.
The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.
Recommender systems have been widely applied in different real-life scenarios to help us find useful information. Recently, Reinforcement Learning (RL) based recommender systems have become an emerging research topic. It often surpasses traditional recommendation models even most deep learning-based methods, owing to its interactive nature and autonomous learning ability. Nevertheless, there are various challenges of RL when applying in recommender systems. Toward this end, we firstly provide a thorough overview, comparisons, and summarization of RL approaches for five typical recommendation scenarios, following three main categories of RL: value-function, policy search, and Actor-Critic. Then, we systematically analyze the challenges and relevant solutions on the basis of existing literature. Finally, under discussion for open issues of RL and its limitations of recommendation, we highlight some potential research directions in this field.
Reinforcement learning is one of the core components in designing an artificial intelligent system emphasizing real-time response. Reinforcement learning influences the system to take actions within an arbitrary environment either having previous knowledge about the environment model or not. In this paper, we present a comprehensive study on Reinforcement Learning focusing on various dimensions including challenges, the recent development of different state-of-the-art techniques, and future directions. The fundamental objective of this paper is to provide a framework for the presentation of available methods of reinforcement learning that is informative enough and simple to follow for the new researchers and academics in this domain considering the latest concerns. First, we illustrated the core techniques of reinforcement learning in an easily understandable and comparable way. Finally, we analyzed and depicted the recent developments in reinforcement learning approaches. My analysis pointed out that most of the models focused on tuning policy values rather than tuning other things in a particular state of reasoning.
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.