We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal while avoiding obstacles in uncertain environments. First, we use stochastic model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. Second, recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution, which are trained on uncertainty outputs of various simultaneous localization and mapping algorithms. When two or more robots are in communication range, these uncertainties are then updated using a distributed Kalman filtering approach. Lastly, a Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal. Our complete methods are demonstrated on a ground and aerial robot simultaneously (code available at: //github.com/AlexS28/SABER).
As the demands of autonomous mobile robots are increasing in recent years, the requirement of the path planning/navigation algorithm should not be content with the ability to reach the target without any collisions, but also should try to achieve possible optimal or suboptimal path from the initial position to the target according to the robot's constrains in practice. This report investigates path planning and control strategies for mobile robots with machine learning techniques, including ground mobile robots and flying UAVs. In this report, the hybrid reactive collision-free navigation problem under an unknown static environment is investigated firstly. By combining both the reactive navigation and Q-learning method, we intend to keep the good characteristics of reactive navigation algorithm and Q-learning and overcome the shortcomings of only relying on one of them. The proposed method is then extended into 3D environments. The performance of the mentioned strategies are verified by extensive computer simulations, and good results are obtained. Furthermore, the more challenging dynamic environment situation is taken into our consideration. We tackled this problem by developing a new path planning method that utilizes the integrated environment representation and reinforcement learning. Our novel approach enables to find the optimal path to the target efficiently and avoid collisions in a cluttered environment with steady and moving obstacles. The performance of these methods is compared with other different aspects.
Jointly achieving safety and efficiency in human-robot interaction (HRI) settings is a challenging problem, as the robot's planning objectives may be at odds with the human's own intent and expectations. Recent approaches ensure safe robot operation in uncertain environments through a supervisory control scheme, sometimes called "shielding", which overrides the robot's nominal plan with a safety fallback strategy when a safety-critical event is imminent. These reactive "last-resort" strategies (typically in the form of aggressive emergency maneuvers) focus on preserving safety without efficiency considerations; when the nominal planner is unaware of possible safety overrides, shielding can be activated more frequently than necessary, leading to degraded performance. In this work, we propose a new shielding-based planning approach that allows the robot to plan efficiently by explicitly accounting for possible future shielding events. Leveraging recent work on Bayesian human motion prediction, the resulting robot policy proactively balances nominal performance with the risk of high-cost emergency maneuvers triggered by low-probability human behaviors. We formalize Shielding-Aware Robust Planning (SHARP) as a stochastic optimal control problem and propose a computationally efficient framework for finding tractable approximate solutions at runtime. Our method outperforms the shielding-agnostic motion planning baseline (equipped with the same human intent inference scheme) on simulated driving examples with human trajectories taken from the recently released Waymo Open Motion Dataset.
Competent multi-lane cruising requires using lane changes and within-lane maneuvers to achieve good speed and maintain safety. This paper proposes a design for autonomous multi-lane cruising by combining a hierarchical reinforcement learning framework with a novel state-action space abstraction. While the proposed solution follows the classical hierarchy of behavior decision, motion planning and control, it introduces a key intermediate abstraction within the motion planner to discretize the state-action space according to high level behavioral decisions. We argue that this design allows principled modular extension of motion planning, in contrast to using either monolithic behavior cloning or a large set of hand-written rules. Moreover, we demonstrate that our state-action space abstraction allows transferring of the trained models without retraining from a simulated environment with virtually no dynamics to one with significantly more realistic dynamics. Together, these results suggest that our proposed hierarchical architecture is a promising way to allow reinforcement learning to be applied to complex multi-lane cruising in the real world.
Motion planning under uncertainty is one of the main challenges in developing autonomous driving vehicles. In this work, we focus on the uncertainty in sensing and perception, resulted from a limited field of view, occlusions, and sensing range. This problem is often tackled by considering hypothetical hidden objects in occluded areas or beyond the sensing range to guarantee passive safety. However, this may result in conservative planning and expensive computation, particularly when numerous hypothetical objects need to be considered. We propose a reinforcement learning (RL) based solution to manage uncertainty by optimizing for the worst case outcome. This approach is in contrast to traditional RL, where the agents try to maximize the average expected reward. The proposed approach is built on top of the Distributional RL with its policy optimization maximizing the stochastic outcomes' lower bound. This modification can be applied to a range of RL algorithms. As a proof-of-concept, the approach is applied to two different RL algorithms, Soft Actor-Critic and DQN. The approach is evaluated against two challenging scenarios of pedestrians crossing with occlusion and curved roads with a limited field of view. The algorithm is trained and evaluated using the SUMO traffic simulator. The proposed approach yields much better motion planning behavior compared to conventional RL algorithms and behaves comparably to humans driving style.
We present a modular algorithm for the computational co-design of legged robots and dynamic maneuvers. Current state-of-the-art approaches are based on random sampling or concurrent optimization. We propose a bilevel optimization approach that exploits the derivatives of the motion planning sub-problem (the inner level). Our approach allows for the use of any differentiable motion planner in the inner level, similarly to sampling methods, but also allows for an upper level that captures arbitrary design constraints and costs. Our approach can optimize the robot's morphology and actuator parameters while considering its full dynamics, joint limits and physical constraints such as friction cones. We demonstrate these capabilities by studying jumping and trotting gaits and verify our results in a physics simulator, showing it successfully minimizes the energy used.
Quadrotors are among the most agile flying robots. However, planning time-optimal trajectories at the actuation limit through multiple waypoints remains an open problem. This is crucial for applications such as inspection, delivery, search and rescue, and drone racing. Early works used polynomial trajectory formulations, which do not exploit the full actuator potential because of their inherent smoothness. Recent works resorted to numerical optimization but require waypoints to be allocated as costs or constraints at specific discrete times. However, this time allocation is a priori unknown and renders previous works incapable of producing truly time-optimal trajectories. To generate truly time-optimal trajectories, we propose a solution to the time allocation problem while exploiting the full quadrotor's actuator potential. We achieve this by introducing a formulation of progress along the trajectory, which enables the simultaneous optimization of the time allocation and the trajectory itself. We compare our method against related approaches and validate it in real-world flights in one of the world's largest motion-capture systems, where we outperform human expert drone pilots in a drone-racing task.
In this work, we develop the Batch Belief Trees (BBT) algorithm for motion planning under motion and sensing uncertainties. The algorithm interleaves between batch sampling, building a graph of nominal trajectories in the state space, and searching over the graph to find belief space motion plans. By searching over the graph, BBT finds sophisticated plans that will visit (and revisit) information-rich regions to reduce uncertainty. One of the key benefits of this algorithm is the modified interplay between exploration and exploitation. Instead of an exhaustive search (exploitation) after one exploration step, the proposed algorithm uses batch samples to explore the state space and also does not require exhaustive search before the next iteration of batch sampling, which adds flexibility. The algorithm finds motion plans that converge to the optimal one as more samples are added to the graph. We test BBT in different planning environments. Our numerical investigation confirms that BBT finds non-trivial motion plans and is faster compared with previous similar methods.
Neural Radiance Fields (NeRFs) have recently emerged as a powerful paradigm for the representation of natural, complex 3D scenes. NeRFs represent continuous volumetric density and RGB values in a neural network, and generate photo-realistic images from unseen camera viewpoints through ray tracing. We propose an algorithm for navigating a robot through a 3D environment represented as a NeRF using only an on-board RGB camera for localization. We assume the NeRF for the scene has been pre-trained offline, and the robot's objective is to navigate through unoccupied space in the NeRF to reach a goal pose. We introduce a trajectory optimization algorithm that avoids collisions with high-density regions in the NeRF based on a discrete time version of differential flatness that is amenable to constraining the robot's full pose and control inputs. We also introduce an optimization based filtering method to estimate 6DoF pose and velocities for the robot in the NeRF given only an onboard RGB camera. We combine the trajectory planner with the pose filter in an online replanning loop to give a vision-based robot navigation pipeline. We present simulation results with a quadrotor robot navigating through a jungle gym environment, the inside of a church, and Stonehenge using only an RGB camera. We also demonstrate an omnidirectional ground robot navigating through the church, requiring it to reorient to fit through the narrow gap. Videos of this work can be found at //mikh3x4.github.io/nerf-navigation/ .
As robots move from the laboratory into the real world, motion planning will need to account for model uncertainty and risk. For robot motions involving intermittent contact, planning for uncertainty in contact is especially important, as failure to successfully make and maintain contact can be catastrophic. Here, we model uncertainty in terrain geometry and friction characteristics, and combine a risk-sensitive objective with chance constraints to provide a trade-off between robustness to uncertainty and constraint satisfaction with an arbitrarily high feasibility guarantee. We evaluate our approach in two simple examples: a push-block system for benchmarking and a single-legged hopper. We demonstrate that chance constraints alone produce trajectories similar to those produced using strict complementarity constraints; however, when equipped with a robust objective, we show the chance constraints can mediate a trade-off between robustness to uncertainty and strict constraint satisfaction. Thus, our study may represent an important step towards reasoning about contact uncertainty in motion planning.
Cooperatively planning for multiple agents has been proposed as a promising method for strategic and motion planning for automated vehicles. By taking into account the intent of every agent, the ego agent can incorporate future interactions with human-driven vehicles into its planning. The problem is often formulated as a multi-agent game and solved using iterative algorithms operating on a discretized action or state space. Even if converging to a Nash equilibrium, the result will often be only sub-optimal. In this paper, we define a linear differential game for a set of interacting agents and solve it to optimality using mixed-integer programming. A disjunctive formulation of the orientation allows us to formulate linear constraints to prevent agent-to-agent collision while preserving the non-holonomic motion properties of the vehicle model. Soft constraints account for prediction errors. We then define a joint cost function, where a cooperation factor can adapt between altruistic, cooperative, and egoistic behavior. We study the influence of the cooperation factor to solve scenarios, where interaction between the agents is necessary to solve them successfully. The approach is then evaluated in a racing scenario, where we show the applicability of the formulation in a closed-loop receding horizon replanning fashion. By accounting for inaccuracies in the cooperative assumption and the actual behavior, we can indeed successfully plan an optimal control strategy interacting closely with other agents.