This paper describes a resilient navigation and planning algorithm for the high-speed Indy autonomous challenge (IAC). The IAC is a competition with full-scale autonomous race cars that drives up to 290 km/h (180 mph). However, owing to race cars' high-speed and heavy vibration, GPS/INS system is prone to degradation, causing critical localization errors and leading to serious accidents. To this end, we propose a robust navigation system to implement a multi-sensor fusion Kalman filter. We present the degradation identification based on probabilistic approaches to computing optimal measurement values for the Kalman filter correction step. Simultaneously, we present a resilient navigation system so that the race car follows the race track in the event of localization failure. In addition, an optimal path planning algorithm for obstacle avoidance is proposed. Considering the original optimal racing line, obstacles, and vehicle dynamics, we propose a road-graph-based path planning algorithm to ensure that our race car drives in in-bounded conditions. The designed localization system was experimentally evaluated to determine its ability to handle the degraded data and prevent serious crashing accidents during high-speed driving. Finally, we describe the successful completion of the obstacle avoidance challenge at the Indianapolis Motor Speedway (IMS) in October 2021.
Human awareness in robot motion planning is crucial for seamless interaction with humans. Many existing techniques slow down, stop, or change the robot's trajectory locally to avoid collisions with humans. Although using the information on the human's state in the path planning phase could reduce future interference with the human's movements and make safety stops less frequent, such an approach is less widespread. This paper proposes a novel approach to embedding a human model in the robot's path planner. The method explicitly addresses the problem of minimizing the path execution time, including slowdowns and stops owed to the proximity of humans. For this purpose, it converts safety speed limits into configuration-space cost functions that drive the path's optimization. The costmap can be updated based on the observed or predicted state of the human. The method can handle deterministic and probabilistic representations of the human state and is independent of the prediction algorithm. Numerical and experimental results on an industrial collaborative cell demonstrate that the proposed approach consistently reduces the robot's execution time and avoids unnecessary safety speed reductions.
When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward functions can be extremely challenging for complex tasks and environments. A promising approach is to learn reward functions from humans. Recently, several robot learning works embrace this approach and leverage human demonstrations to learn the reward functions. Known as inverse reinforcement learning, this approach relies on a fundamental assumption: humans can provide near-optimal demonstrations to the robot. Unfortunately, this is rarely the case: human demonstrations to the robot are often suboptimal due to various reasons, e.g., difficulty of teleoperation, robot having high degrees of freedom, or humans' cognitive limitations. This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities. Specifically, we study how reward functions can be learned using comparative feedback, in which the human user compares multiple robot trajectories instead of (or in addition to) providing demonstrations. To this end, we first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function, which may be parametric or non-parametric. Next, we propose active learning techniques to enable the robot to ask for comparison feedback that optimizes for the expected information that will be gained from that user feedback. Finally, we demonstrate the applicability of our methods in a wide variety of domains, ranging from autonomous driving simulations to home robotics, from standard reinforcement learning benchmarks to lower-body exoskeletons.
In the past two decades, autonomous driving has been catalyzed into reality by the growing capabilities of machine learning. This paradigm shift possesses significant potential to transform the future of mobility and reshape our society as a whole. With the recent advances in perception, planning, and control capabilities, autonomous driving technologies are being rolled out for public trials, yet we remain far from being able to rigorously ensure the resilient operations of these systems across the long-tailed nature of the driving environment. Given the limitations of real-world testing, autonomous vehicle simulation stands as the critical component in exploring the edge of autonomous driving capabilities, developing the robust behaviors required for successful real-world operation, and enabling the extraction of hidden risks from these complex systems prior to deployment. This paper presents the current state-of-the-art simulation frameworks and methodologies used in the development of autonomous driving systems, with a focus on outlining how simulation is used to build the resiliency required for real-world operation and the methods developed to bridge the gap between simulation and reality. A synthesis of the key challenges surrounding autonomous driving simulation is presented, specifically highlighting the opportunities to further advance the ability to continuously learn in simulation and effectively transfer the learning into the real-world - enabling autonomous vehicles to exit the guardrails of simulation and deliver robust and resilient operations at scale.
Self-evolution is indispensable to realize full autonomous driving. This paper presents a self-evolving decision-making system based on the Integrated Decision and Control (IDC), an advanced framework built on reinforcement learning (RL). First, an RL algorithm called constrained mixed policy gradient (CMPG) is proposed to consistently upgrade the driving policy of the IDC. It adapts the MPG under the penalty method so that it can solve constrained optimization problems using both the data and model. Second, an attention-based encoding (ABE) method is designed to tackle the state representation issue. It introduces an embedding network for feature extraction and a weighting network for feature fusion, fulfilling order-insensitive encoding and importance distinguishing of road users. Finally, by fusing CMPG and ABE, we develop the first data-driven decision and control system under the IDC architecture, and deploy the system on a fully-functional self-driving vehicle running in daily operation. Experiment results show that boosting by data, the system can achieve better driving ability over model-based methods. It also demonstrates safe, efficient and smart driving behavior in various complex scenes at a signalized intersection with real mixed traffic flow.
We study automated test generation for verifying discrete decision-making modules in autonomous systems. We utilize linear temporal logic to encode the requirements on the system under test in the system specification and the behavior that we want to observe during the test is given as the test specification which is unknown to the system. First, we use the specifications and their corresponding non-deterministic B\"uchi automata to generate the specification product automaton. Second, a virtual product graph representing the high-level interaction between the system and the test environment is constructed modeling the product automaton encoding the system, the test environment, and specifications. The main result of this paper is an optimization problem, framed as a multi-commodity network flow problem, that solves for constraints on the virtual product graph which can then be projected to the test environment. Therefore, the result of the optimization problem is reactive test synthesis that ensures that the system meets the test specifications along with satisfying the system specifications. This framework is illustrated in simulation on grid world examples, and demonstrated on hardware with the Unitree A1 quadruped, wherein dynamic locomotion behaviors are verified in the context of reactive test environments.
We propose a framework for planning in unknown dynamic environments with probabilistic safety guarantees using conformal prediction. Particularly, we design a model predictive controller (MPC) that uses i) trajectory predictions of the dynamic environment, and ii) prediction regions quantifying the uncertainty of the predictions. To obtain prediction regions, we use conformal prediction, a statistical tool for uncertainty quantification, that requires availability of offline trajectory data - a reasonable assumption in many applications such as autonomous driving. The prediction regions are valid, i.e., they hold with a user-defined probability, so that the MPC is provably safe. We illustrate the results in the self-driving car simulator CARLA at a pedestrian-filled intersection. The strength of our approach is compatibility with state of the art trajectory predictors, e.g., RNNs and LSTMs, while making no assumptions on the underlying trajectory-generating distribution. To the best of our knowledge, these are the first results that provide valid safety guarantees in such a setting.
When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.
We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.
Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.
Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes. To this end, 3D object detection serves as the core basis of such perception system especially for the sake of path planning, motion prediction, collision avoidance, etc. Generally, stereo or monocular images with corresponding 3D point clouds are already standard layout for 3D object detection, out of which point clouds are increasingly prevalent with accurate depth information being provided. Despite existing efforts, 3D object detection on point clouds is still in its infancy due to high sparseness and irregularity of point clouds by nature, misalignment view between camera view and LiDAR bird's eye of view for modality synergies, occlusions and scale variations at long distances, etc. Recently, profound progress has been made in 3D object detection, with a large body of literature being investigated to address this vision task. As such, we present a comprehensive review of the latest progress in this field covering all the main topics including sensors, fundamentals, and the recent state-of-the-art detection methods with their pros and cons. Furthermore, we introduce metrics and provide quantitative comparisons on popular public datasets. The avenues for future work are going to be judiciously identified after an in-deep analysis of the surveyed works. Finally, we conclude this paper.