This paper aims to improve the path quality and computational efficiency of sampling-based kinodynamic planners for vehicular navigation. It proposes a learning framework for identifying promising controls during the expansion process of sampling-based planners. Given a dynamics model, a reinforcement learning process is trained offline to return a low-cost control that reaches a local goal state (i.e., a waypoint) in the absence of obstacles. By focusing on the system's dynamics and not knowing the environment, this process is data-efficient and takes place once for a robotic system. In this way, it can be reused in different environments. The planner generates online local goal states for the learned controller in an informed manner to bias towards the goal and consecutively in an exploratory, random manner. For the informed expansion, local goal states are generated either via (a) medial axis information in environments with obstacles, or (b) wavefront information for setups with traversability costs. The learning process and the resulting planning framework are evaluated for a first and second-order differential drive system, as well as a physically simulated Segway robot. The results show that the proposed integration of learning and planning can produce higher quality paths than sampling-based kinodynamic planning with random controls in fewer iterations and computation time.
In this paper, we present a complete autonomous navigation pipeline for unstructured outdoor environments. The main contribution of this work is on the path planning module, which we divided into two main categories: Global Path Planning (GPP) and Local Path Planning (LPP). For environment representation, instead of complex and heavy grid maps, the GPP layer uses road network information obtained directly from OpenStreetMaps (OSM). In the LPP layer, we use a novel Naive-Valley-Path (NVP) method to generate a local path avoiding obstacles in the road in real-time. This approach uses a naive representation of the local environment using a LiDAR sensor. Also, it uses a naive optimization that exploits the concept of "valley" areas in the cost map. We demonstrate the system's robustness experimentally in our research platform BLUE, driving autonomously across the University of Alicante Scientific Park for more than 20 km in a 12.33 ha area.
In this work we consider the problem of mobile robots that need to manipulate/transport an object via cables or robotic arms. We consider the scenario where the number of manipulating robots is redundant, i.e. a desired object configuration can be obtained by different configurations of the robots. The objective of this work is to show that communication can be used to implement cooperative local feedback controllers in the robots to improve disturbance rejection and reduce structural stress in the object. In particular we consider the realistic scenario where measurements are sampled and transmitted over wireless, and the sampling period is comparable with the system dynamics time constants. We first propose a kinematic model which is consistent with the overall systems dynamics under high-gain control and then we provide sufficient conditions for the exponential stability and monotonic decrease of the configuration error under different norms. Finally, we test the proposed controllers on the full dynamical systems showing the benefit of local communication.
Completely positive and trace-preserving maps characterize physically implementable quantum operations. On the other hand, general linear maps, such as positive but not completely positive maps, which can not be physically implemented, are fundamental ingredients in quantum information, both in theoretical and practical perspectives. This raises the question of how well one can simulate or approximate the action of a general linear map by physically implementable operations. In this work, we introduce a systematic framework to resolve this task using the quasiprobability decomposition technique. We decompose a target linear map into a linear combination of physically implementable operations and introduce the physical implementability measure as the least amount of negative portion that the quasiprobability must pertain, which directly quantifies the cost of simulating a given map using physically implementable quantum operations. We show this measure is efficiently computable by semidefinite programs and prove several properties of this measure, such as faithfulness, additivity, and unitary invariance. We derive lower and upper bounds in terms of the Choi operator's trace norm and obtain analytic expressions for several linear maps of practical interests. Furthermore, we endow this measure with an operational meaning within the quantum error mitigation scenario: it establishes the lower bound of the sampling cost achievable via the quasiprobability decomposition technique. In particular, for parallel quantum noises, we show that global error mitigation has no advantage over local error mitigation.
With the advancement of affordable self-driving vehicles using complicated nonlinear optimization but limited computation resources, computation time becomes a matter of concern. Other factors such as actuator dynamics and actuator command processing cost also unavoidably cause delays. In high-speed scenarios, these delays are critical to the safety of a vehicle. Recent works consider these delays individually, but none unifies them all in the context of autonomous driving. Moreover, recent works inappropriately consider computation time as a constant or a large upper bound, which makes the control either less responsive or over-conservative. To deal with all these delays, we present a unified framework by 1) modeling actuation dynamics, 2) using robust tube model predictive control, 3) using a novel adaptive Kalman filter without assuminga known process model and noise covariance, which makes the controller safe while minimizing conservativeness. On onehand, our approach can serve as a standalone controller; on theother hand, our approach provides a safety guard for a high-level controller, which assumes no delay. This can be used for compensating the sim-to-real gap when deploying a black-box learning-enabled controller trained in a simplistic environment without considering delays for practical vehicle systems.
Drift control is significant to the safety of autonomous vehicles when there is a sudden loss of traction due to external conditions such as rain or snow. It is a challenging control problem due to the presence of significant sideslip and nearly full saturation of the tires. In this paper, we focus on the control of drift maneuvers following circular paths with either fixed or moving centers, subject to change in the tire-ground interaction, which are common training tasks for drift enthusiasts and can therefore be used as benchmarks of the performance of drift control. In order to achieve the above tasks, we propose a novel hierarchical control architecture which decouples the curvature and center control of the trajectory. In particular, an outer loop stabilizes the center by tuning the target curvature, and an inner loop tracks the curvature using a feedforward/feedback controller enhanced by an $\mathcal{L}_1$ adaptive component. The hierarchical architecture is flexible because the inner loop is task-agnostic and adaptive to changes in tire-road interaction, which allows the outer loop to be designed independent of low-level dynamics, opening up the possibility of incorporating sophisticated planning algorithms. We implement our control strategy on a simulation platform as well as on a 1/10 scale Radio-Control~(RC) car, and both the simulation and experiment results illustrate the effectiveness of our strategy in achieving the above described set of drift maneuvering tasks.
We develop an autonomous navigation algorithm for a robot operating in two-dimensional environments cluttered with obstacles having arbitrary convex shapes. The proposed navigation approach relies on a hybrid feedback to guarantee global asymptotic stabilization of the robot towards a predefined target location while ensuring the forward invariance of the obstacle-free workspace. The main idea consists in designing an appropriate switching strategy between the move-to-target mode and the obstacle-avoidance mode based on the proximity of the robot with respect to the nearest obstacle. The proposed hybrid controller generates continuous velocity input trajectories when the robot is initialized away from the boundaries of the unsafe regions. Finally, we provide an algorithmic procedure for the sensor-based implementation of the proposed hybrid controller and validate its effectiveness through some simulation results.
Imitation learning aims to extract knowledge from human experts' demonstrations or artificially created agents in order to replicate their behaviors. Its success has been demonstrated in areas such as video games, autonomous driving, robotic simulations and object manipulation. However, this replicating process could be problematic, such as the performance is highly dependent on the demonstration quality, and most trained agents are limited to perform well in task-specific environments. In this survey, we provide a systematic review on imitation learning. We first introduce the background knowledge from development history and preliminaries, followed by presenting different taxonomies within Imitation Learning and key milestones of the field. We then detail challenges in learning strategies and present research opportunities with learning policy from suboptimal demonstration, voice instructions and other associated optimization schemes.
Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the pre-processing perception system while the supervised learning-based models are limited by the accessibility of extensive human experience. We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover, we propose to specialize adaptive policies and steering-angle reward designs for different control signals (i.e. follow, straight, turn right, turn left) based on the shared representations to improve the model capability in tackling with diverse cases. Extensive experiments on CARLA driving benchmark demonstrate that CIRL substantially outperforms all previous methods in terms of the percentage of successfully completed episodes on a variety of goal-directed driving tasks. We also show its superior generalization capability in unseen environments. To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.
This paper proposes a Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, our algorithm synthesizes a policy that satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.
Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations can cause proficient but narrowly-learned policies to fail at test time. In this work, we propose to learn how to quickly and effectively adapt online to new situations as well as to perturbations. To enable sample-efficient meta-learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach trains a global model such that, when combined with recent data, the model can be be rapidly adapted to the local context. Our experiments demonstrate that our approach can enable simulated agents to adapt their behavior online to novel terrains, to a crippled leg, and in highly-dynamic environments.