The tasks that an autonomous agent is expected to perform are often optional or are incompatible with each other owing to the agent's limited actuation capabilities, specifically the dynamics and control input bounds. We encode tasks as time-dependent state constraints and leverage the advances in multi-objective optimization to formulate the problem of choosing tasks as selection of a feasible subset of constraints that can be satisfied for all time and maximizes a performance metric. We show that this problem, although amenable to reachability or mixed integer model predictive control-based analysis in the offline phase, is NP-Hard in general and therefore requires heuristics to be solved efficiently. When incompatibility in constraints is observed under a given policy that imposes task constraints at each time step in an optimization problem, we assign a Lagrange score to each of these constraints based on the variation in the corresponding Lagrange multipliers over the compatible time horizon. These scores are then used to decide the order in which constraints are dropped in a greedy strategy. We further employ a genetic algorithm to improve upon the greedy strategy. We evaluate our method on a robot waypoint following task when the low-level controllers that impose state constraints are described by Control Barrier Function-based Quadratic Programs and provide a comparison with waypoint selection based on knowledge of backward reachable sets.
Deploying autonomous robots in crowded indoor environments usually requires them to have accurate dynamic obstacle perception. Although plenty of previous works in the autonomous driving field have investigated the 3D object detection problem, the usage of dense point clouds from a heavy LiDAR and their high computation cost for learning-based data processing make those methods not applicable to small robots, such as vision-based UAVs with small onboard computers. To address this issue, we propose a lightweight 3D dynamic obstacle detection and tracking (DODT) method based on an RGB-D camera, which is designed for low-power robots with limited computing power. Our method adopts a novel ensemble detection strategy, combining multiple computationally efficient but low-accuracy detectors to achieve real-time high-accuracy obstacle detection. Besides, we introduce a new feature-based data association method to prevent mismatches and use the Kalman filter with the constant acceleration model to track detected obstacles. In addition, our system includes an optional and auxiliary learning-based module to enhance the obstacle detection range and dynamic obstacle identification. The users can determine whether or not to run this module based on the available computation resources. The proposed method is implemented in a small quadcopter, and the experiments prove that the algorithm can make the robot detect dynamic obstacles and navigate dynamic environments safely.
In this work, we propose a novel shared autonomy framework to operate articulated robots. We provide strategies to design both the task-oriented hierarchical planning and policy shaping algorithms for efficient human-robot interactions in context-aware operation of articulated robots. Our framework for interplay between the human and the autonomy, as the participating agents in the system, is particularly influenced by the ideas from multi-agent systems, game theory, and theory of mind for a sliding level of autonomy. We formulate the sequential hierarchical human-in-the-loop decision making process by extending MDPs and Options framework to shared autonomy, and make use of deep RL techniques to train an uncertainty-aware shared autonomy policy. To fine-tune the formulation to a human, we use history of the system states, human actions, and their error with respect to a surrogate optimal model to encode human's internal state embeddings, beyond the designed values, by using conditional VAEs. We showcase the effectiveness of our formulation for different human skill levels and degrees of cooperativeness by using a case study of a feller-buncher machine in the challenging tasks of timber harvesting. Our framework is successful in providing a sliding level of autonomy from fully autonomous to fully manual, and is particularly successful in handling a noisy non-cooperative human agent in the loop. The proposed framework advances the state-of-the-art in shared autonomy for operating articulated robots, but can also be applied to other domains where autonomous operation is the ultimate goal.
In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning, outperforming classical hand-designed optimizers. Reinforcement learning (RL) is essentially different from supervised learning and in practice these learned optimizers do not work well even in simple RL tasks. We investigate this phenomenon and identity three issues. First, the gradients of an RL agent vary across a wide range in logarithms while their absolute values are in a small range, making neural networks hard to obtain accurate parameter updates. Second, the agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. Finally, due to highly stochastic agent-environment interactions, the agent-gradients have high bias and variance, which increase the difficulty of learning an optimizer for RL. We propose gradient processing, pipeline training, and a novel optimizer structure with good inductive bias to address these issues. By applying these techniques, for the first time, we show that learning an optimizer for RL from scratch is possible. Although only trained in toy tasks, our learned optimizer can generalize to unseen complex tasks in Brax.
Autonomous driving vehicles aim to free the hands of vehicle operators, helping them to drive easier and faster, meanwhile, improving the safety of driving on the highway or in complex scenarios. Automated driving systems (ADS) are developed and designed in the last several decades to realize fully autonomous driving vehicles (L4 or L5 level). The scale of sampling space leads to the main computational complexity. Therefore, by adjusting the sampling method, the difficulty to solve the real-time motion planning problem could be incrementally reduced. Usually, the Average Sampling Method is taken in Lattice Planner, and Random Sampling Method is chosen for RRT algorithms. However, both of them don't take into consideration the prior information, and focus the sampling space on areas where the optimal trajectory is previously obtained. Therefore, \emph{in this thesis it is proposed an adaptive sampling method to reduce the computation complexity, and achieve faster solutions while keeping the quality of optimal solution unchanged}. The main contribution of this thesis is the significant decrease in the complexity of the optimization problem for motion planning, without sacrificing the quality of the final trajectory output, with the implementation of an Adaptive Sampling method based on Artificial Potential Field (ASAPF). In addition, also the quality and the stability of the trajectory is improved due to the appropriate sampling of the appropriate region to be analyzed.
We propose a novel probabilistically robust controller for the guidance of an unmanned aerial vehicle (UAV) in coverage planning missions, which can simultaneously optimize both the UAV's motion, and camera control inputs for the 3D coverage of a given object of interest. Specifically, the coverage planning problem is formulated in this work as an optimal control problem with logical constraints to enable the UAV agent to jointly: a) select a series of discrete camera field-of-view states which satisfy a set of coverage constraints, and b) optimize its motion control inputs according to a specified mission objective. We show how this hybrid optimal control problem can be solved with standard optimization tools by converting the logical expressions in the constraints into equality/inequality constraints involving only continuous variables. Finally, probabilistic robustness is achieved by integrating the unscented transformation to the proposed controller, thus enabling the design of robust open-loop coverage plans which take into account the future posterior distribution of the UAV's state inside the planning horizon.
Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6-DoF frame-by-frame PnP-RANSAC localization pipeline. We refine initial pose estimates with a motion model and propose a method for calculating the predicted quality of future pose estimates, triggered based on whether or not the autonomous vehicle's motion is constrained by the relative frame-to-frame location of dynamic vehicles in the environment. Our approach detects and identifies suitable dynamic vehicles to define these pose constraints to modify a pose filter, resulting in improved recall across a range of localization tolerances from $0.25m$ to $5m$, compared to a state-of-the-art baseline single image PnP method and its vanilla pose filtering. Our constraint detection system is active for approximately $35\%$ of the time on the Ford AV dataset and localization is particularly improved when the constraint detection is active.
Over the last decade, the use of autonomous drone systems for surveying, search and rescue, or last-mile delivery has increased exponentially. With the rise of these applications comes the need for highly robust, safety-critical algorithms which can operate drones in complex and uncertain environments. Additionally, flying fast enables drones to cover more ground which in turn increases productivity and further strengthens their use case. One proxy for developing algorithms used in high-speed navigation is the task of autonomous drone racing, where researchers program drones to fly through a sequence of gates and avoid obstacles as quickly as possible using onboard sensors and limited computational power. Speeds and accelerations exceed over 80 kph and 4 g respectively, raising significant challenges across perception, planning, control, and state estimation. To achieve maximum performance, systems require real-time algorithms that are robust to motion blur, high dynamic range, model uncertainties, aerodynamic disturbances, and often unpredictable opponents. This survey covers the progression of autonomous drone racing across model-based and learning-based approaches. We provide an overview of the field, its evolution over the years, and conclude with the biggest challenges and open questions to be faced in the future.
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into individual agents' Q-values. Unlike previous works which restrict the representation relation of the individual Q-values and the global one, we leverage the integrated gradient attribution technique into deep MARL to directly decompose global Q-values along trajectory paths to assign credits for agents. We evaluate QPD on the challenging StarCraft II micromanagement tasks and show that QPD achieves the state-of-the-art performance in both homogeneous and heterogeneous multiagent scenarios compared with existing cooperative MARL algorithms.
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.