This paper presents GAMMA, a general motion prediction model that enables large-scale real-time simulation and planning for autonomous driving. GAMMA models heterogeneous, interactive traffic agents. They operate under diverse road conditions, with various geometric and kinematic constraints. GAMMA treats the prediction task as constrained optimization in traffic agents' velocity space. The objective is to optimize an agent's driving performance, while obeying all the constraints resulting from the agent's kinematics, collision avoidance with other agents, and the environmental context. Further, GAMMA explicitly conditions the prediction on human behavioral states as parameters of the optimization model, in order to account for versatile human behaviors. We evaluated GAMMA on a set of real-world benchmark datasets. The results show that GAMMA achieves high prediction accuracy on both homogeneous and heterogeneous traffic datasets, with sub-millisecond execution time. Further, the computational efficiency and the flexibility of GAMMA enable (i) simulation of mixed urban traffic at many locations worldwide and (ii) planning for autonomous driving in dense traffic with uncertain driver behaviors, both in real-time. The open-source code of GAMMA is available online.
Traversability prediction is a fundamental perception capability for autonomous navigation. The diversity of data in different domains imposes significant gaps to the prediction performance of the perception model. In this work, we make efforts to reduce the gaps by proposing a novel coarse-to-fine unsupervised domain adaptation (UDA) model - CALI. Our aim is to transfer the perception model with high data efficiency, eliminate the prohibitively expensive data labeling, and improve the generalization capability during the adaptation from easy-to-obtain source domains to various challenging target domains. We prove that a combination of a coarse alignment and a fine alignment can be beneficial to each other and further design a first-coarse-then-fine alignment process. This proposed work bridges theoretical analyses and algorithm designs, leading to an efficient UDA model with easy and stable training. We show the advantages of our proposed model over multiple baselines in several challenging domain adaptation setups. To further validate the effectiveness of our model, we then combine our perception model with a visual planner to build a navigation system and show the high reliability of our model in complex natural environments where no labeled data is available.
In this paper, we present our software sensor fusion framework for self-driving cars and other autonomous robots. We have designed our framework as a universal and scalable platform for building up a robust 3D model of the agent's surrounding environment by fusing a wide range of various sensors into the data model that we can use as a basement for the decision making and planning algorithms. Our software currently covers the data fusion of the RGB and thermal cameras, 3D LiDARs, 3D IMU, and a GNSS positioning. The framework covers a complete pipeline from data loading, filtering, preprocessing, environment model construction, visualization, and data storage. The architecture allows the community to modify the existing setup or to extend our solution with new ideas. The entire software is fully compatible with ROS (Robotic Operation System), which allows the framework to cooperate with other ROS-based software. The source codes are fully available as an open-source under the MIT license. See //github.com/Robotics-BUT/Atlas-Fusion.
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the attention information from such models can also be used to measure the importance of each agent with respect to the ego vehicle's future planned trajectory. Our experiment results on the nuPlans dataset show that our method can effectively find and rank surrounding agents by their impact on the ego's plan.
Autonomous driving is an active research topic in both academia and industry. However, most of the existing solutions focus on improving the accuracy by training learnable models with centralized large-scale data. Therefore, these methods do not take into account the user's privacy. In this paper, we present a new approach to learn autonomous driving policy while respecting privacy concerns. We propose a peer-to-peer Deep Federated Learning (DFL) approach to train deep architectures in a fully decentralized manner and remove the need for central orchestration. We design a new Federated Autonomous Driving network (FADNet) that can improve the model stability, ensure convergence, and handle imbalanced data distribution problems while is being trained with federated learning methods. Intensively experimental results on three datasets show that our approach with FADNet and DFL achieves superior accuracy compared with other recent methods. Furthermore, our approach can maintain privacy by not collecting user data to a central server.
Multi-UAV collision avoidance is a challenging task for UAV swarm applications due to the need of tight cooperation among swarm members for collision-free path planning. Centralized Training with Decentralized Execution (CTDE) in Multi-Agent Reinforcement Learning is a promising method for multi-UAV collision avoidance, in which the key challenge is to effectively learn decentralized policies that can maximize a global reward cooperatively. We propose a new multi-agent critic-actor learning scheme called MACA for UAV swarm collision avoidance. MACA uses a centralized critic to maximize the discounted global reward that considers both safety and energy efficiency, and an actor per UAV to find decentralized policies to avoid collisions. To solve the credit assignment problem in CTDE, we design a counterfactual baseline that marginalizes both an agent's state and action, enabling to evaluate the importance of an agent in the joint observation-action space. To train and evaluate MACA, we design our own simulation environment MACAEnv to closely mimic the realistic behaviors of a UAV swarm. Simulation results show that MACA achieves more than 16% higher average reward than two state-of-the-art MARL algorithms and reduces failure rate by 90% and response time by over 99% compared to a conventional UAV swarm collision avoidance algorithm in all test scenarios.
Driving safely requires multiple capabilities from human and intelligent agents, such as the generalizability to unseen environments, the safety awareness of the surrounding traffic, and the decision-making in complex multi-agent settings. Despite the great success of Reinforcement Learning (RL), most of the RL research works investigate each capability separately due to the lack of integrated environments. In this work, we develop a new driving simulation platform called MetaDrive to support the research of generalizable reinforcement learning algorithms for machine autonomy. MetaDrive is highly compositional, which can generate an infinite number of diverse driving scenarios from both the procedural generation and the real data importing. Based on MetaDrive, we construct a variety of RL tasks and baselines in both single-agent and multi-agent settings, including benchmarking generalizability across unseen scenes, safe exploration, and learning multi-agent traffic. The generalization experiments conducted on both procedurally generated scenarios and real-world scenarios show that increasing the diversity and the size of the training set leads to the improvement of the generalizability of the RL agents. We further evaluate various safe reinforcement learning and multi-agent reinforcement learning algorithms in MetaDrive environments and provide the benchmarks. Source code, documentation, and demo video are available at //metadriverse.github.io/metadrive . More research projects based on MetaDrive simulator are listed at //metadriverse.github.io
The past few years have witnessed an increasing interest in improving the perception performance of LiDARs on autonomous vehicles. While most of the existing works focus on developing new deep learning algorithms or model architectures, we study the problem from the physical design perspective, i.e., how different placements of multiple LiDARs influence the learning-based perception. To this end, we introduce an easy-to-compute information-theoretic surrogate metric to quantitatively and fast evaluate LiDAR placement for 3D detection of different types of objects. We also present a new data collection, detection model training and evaluation framework in the realistic CARLA simulator to evaluate disparate multi-LiDAR configurations. Using several prevalent placements inspired by the designs of self-driving companies, we show the correlation between our surrogate metric and object detection performance of different representative algorithms on KITTI through extensive experiments, validating the effectiveness of our LiDAR placement evaluation approach. Our results show that sensor placement is non-negligible in 3D point cloud-based object detection, which will contribute up to 10% performance discrepancy in terms of average precision in challenging 3D object detection settings. We believe that this is one of the first studies to quantitatively investigate the influence of LiDAR placement on perception performance.
We present SymForce, a fast symbolic computation and code generation library for robotics applications like computer vision, state estimation, motion planning, and controls. SymForce combines the development speed and flexibility of symbolic mathematics with the performance of autogenerated, highly optimized code in C++ or any target runtime language. SymForce provides geometry and camera types, Lie group operations, and branchless singularity handling for creating and analyzing complex symbolic expressions in Python, built on top of SymPy. Generated functions can be integrated as factors into our tangent space nonlinear optimizer, which is highly optimized for real-time production use. We introduce novel methods to automatically compute tangent space Jacobians, eliminating the need for bug-prone handwritten derivatives. This workflow enables faster runtime code, faster development time, and fewer lines of handwritten code versus the state-of-the-art. Our experiments demonstrate that our approach can yield order of magnitude speedups on computational tasks core to robotics. Code is available at //github.com/symforce-org/symforce .
There has been growing interest in the development and deployment of autonomous vehicles on roads over the last few years, encouraged by the empirical successes of powerful artificial intelligence techniques (AI), especially in the applications of deep learning and reinforcement learning. However, as demonstrated by recent traffic accidents, autonomous driving technology is not mature for safe deployment. As AI is the main technology behind the intelligent navigation systems of self-driving vehicles, both the stakeholders and transportation jurisdictions require their AI-driven software architecture to be safe, explainable, and regulatory compliant. We propose a framework that integrates autonomous control, explainable AI, and regulatory compliance to address this issue and validate the framework with a critical analysis in a case study. Moreover, we describe relevant XAI approaches that can help achieve the goals of the framework.
Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.