Cooperatively planning for multiple agents has been proposed as a promising method for strategic and motion planning for automated vehicles. By taking into account the intent of every agent, the ego agent can incorporate future interactions with human-driven vehicles into its planning. The problem is often formulated as a multi-agent game and solved using iterative algorithms operating on a discretized action or state space. Even if converging to a Nash equilibrium, the result will often be only sub-optimal. In this paper, we define a linear differential game for a set of interacting agents and solve it to optimality using mixed-integer programming. A disjunctive formulation of the orientation allows us to formulate linear constraints to prevent agent-to-agent collision while preserving the non-holonomic motion properties of the vehicle model. Soft constraints account for prediction errors. We then define a joint cost function, where a cooperation factor can adapt between altruistic, cooperative, and egoistic behavior. We study the influence of the cooperation factor to solve scenarios, where interaction between the agents is necessary to solve them successfully. The approach is then evaluated in a racing scenario, where we show the applicability of the formulation in a closed-loop receding horizon replanning fashion. By accounting for inaccuracies in the cooperative assumption and the actual behavior, we can indeed successfully plan an optimal control strategy interacting closely with other agents.
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) with unknown transition probabilities over continuous state and action spaces. Linear temporal logic (LTL) is used to specify high-level tasks over infinite horizon, which can be converted into a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets. The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP by incorporating a synchronous tracking-frontier function to record unvisited accepting sets of the automaton, and to facilitate the satisfaction of the accepting conditions. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states and can overcome the issues of sparse rewards. Rigorous analysis shows that any RL method that optimizes the expected discounted return is guaranteed to find an optimal policy whose traces maximize the satisfaction probability. A modular deep deterministic policy gradient (DDPG) is then developed to generate such policies over continuous state and action spaces. The performance of our framework is evaluated via an array of OpenAI gym environments.
Unmanned Aerial Vehicles (UAVs) are now becoming increasingly accessible to amateur and com-mercial users alike. Several types of airspace structures are proposed in recent research, which include several structured free flight concepts. In this paper, for simplic-ity, distributed coordinating the motions of multicopters in structured airspace concepts is focused. This is formulated as a free flight problem, which includes convergence to destination lines and inter-agent collision avoidance. The destination line of each multicopter is known a priori. Further, Lyapunov-like functions are designed elaborately, and formal analysis and proofs of the proposed distributed control are made to show that the free flight control problem can be solved. What is more, by the proposed controller, a multicopter can keep away from another as soon as possible, once it enters into the safety area of another one. Simulations and experiments are given to show the effectiveness of the proposed method.
Population-based metaheuristic algorithms have received significant attention in global optimisation. Human Mental Search (HMS) is a relatively recent population-based metaheuristic that has been shown to work well in comparison to other algorithms. However, HMS is time-consuming and suffers from relatively poor exploration. Having clustered the candidate solutions, HMS selects a winner cluster with the best mean objective function. This is not necessarily the best criterion to choose the winner group and limits the exploration ability of the algorithm. In this paper, we propose an improvement to the HMS algorithm in which the best bids from multiple clusters are used to benefit from enhanced exploration. We also use a one-step k-means algorithm in the clustering phase to improve the speed of the algorithm. Our experimental results show that MCS-HMS outperforms HMS as well as other population-based metaheuristic algorithms
Design and control of autonomous systems that operate in uncertain or adversarial environments can be facilitated by formal modelling and analysis. Probabilistic model checking is a technique to automatically verify, for a given temporal logic specification, that a system model satisfies the specification, as well as to synthesise an optimal strategy for its control. This method has recently been extended to multi-agent systems that exhibit competitive or cooperative behaviour modelled via stochastic games and synthesis of equilibria strategies. In this paper, we provide an overview of probabilistic model checking, focusing on models supported by the PRISM and PRISM-games model checkers. This includes fully observable and partially observable Markov decision processes, as well as turn-based and concurrent stochastic games, together with associated probabilistic temporal logics. We demonstrate the applicability of the framework through illustrative examples from autonomous systems. Finally, we highlight research challenges and suggest directions for future work in this area.
There is an increasing demand for piloted autonomous underwater vehicles (AUVs) to operate in polar ice conditions. At present, AUVs are deployed from ships and directly human-piloted in these regions, entailing a high carbon cost and limiting the scope of operations. A key requirement for long-term autonomous missions is a long-range route planning capability that is aware of the changing ice conditions. In this paper we address the problem of automating long-range route-planning for AUVs operating in the Southern Ocean. We present the route-planning method and results showing that efficient, ice-avoiding, long-distance traverses can be planned.
Challenged by urbanization and increasing travel needs, existing transportation systems need new mobility paradigms. In this article, we present the emerging concept of autonomous mobility-on-demand, whereby centrally orchestrated fleets of autonomous vehicles provide mobility service to customers. We provide a comprehensive review of methods and tools to model and solve problems related to autonomous mobility-on-demand systems. Specifically, we first identify problem settings for their analysis and control, from both operational and planning perspectives. We then review modeling aspects, including transportation networks, transportation demand, congestion, operational constraints, and interactions with existing infrastructure. Thereafter, we provide a systematic analysis of existing solution methods and performance metrics, highlighting trends and trade-offs. Finally, we present various directions for further research.
Inverse optimization is the problem of determining the values of missing input parameters for an associated forward problem that are closest to given estimates and that will make a given target vector optimal. This study is concerned with the relationship of a particular inverse mixed integer linear optimization problem (MILP) to both the forward problem and the separation problem associated with its feasible region. We show that a decision version of the inverse MILP in which a primal bound is verified is coNP-complete, whereas primal bound verification for the associated forward problem is NP-complete, and that the optimal value verification problems for both the inverse problem and the associated forward problem are complete for the complexity class D^P. We also describe a cutting-plane algorithm for solving inverse MILPs that illustrates the close relationship between the separation problem for the convex hull of solutions to a given MILP and the associated inverse problem. The inverse problem is shown to be equivalent to the separation problem for the radial cone defined by all inequalities that are both valid for the convex hull of solutions to the forward problem and binding at the target vector. Thus, the inverse, forward, and separation problems can be said to be equivalent.
Alternating Direction Method of Multipliers (ADMM) is a widely used tool for machine learning in distributed settings, where a machine learning model is trained over distributed data sources through an interactive process of local computation and message passing. Such an iterative process could cause privacy concerns of data owners. The goal of this paper is to provide differential privacy for ADMM-based distributed machine learning. Prior approaches on differentially private ADMM exhibit low utility under high privacy guarantee and often assume the objective functions of the learning problems to be smooth and strongly convex. To address these concerns, we propose a novel differentially private ADMM-based distributed learning algorithm called DP-ADMM, which combines an approximate augmented Lagrangian function with time-varying Gaussian noise addition in the iterative process to achieve higher utility for general objective functions under the same differential privacy guarantee. We also apply the moments accountant method to bound the end-to-end privacy loss. The theoretical analysis shows that DP-ADMM can be applied to a wider class of distributed learning problems, is provably convergent, and offers an explicit utility-privacy tradeoff. To our knowledge, this is the first paper to provide explicit convergence and utility properties for differentially private ADMM-based distributed learning algorithms. The evaluation results demonstrate that our approach can achieve good convergence and model accuracy under high end-to-end differential privacy guarantee.
Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.
This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback controllers only when safety is about to be violated. Under some mild assumptions, solutions to the constrained feedback-controller optimization are guaranteed to be globally optimal, and the monotonic improvement of a feedback controller is thus ensured. In addition, we reformulate the (action-)value function approximation to make any kernel-based nonlinear function estimation method applicable. We then employ a state-of-the-art kernel adaptive filtering technique for the (action-)value function approximation. The resulting framework is verified experimentally on a brushbot, whose dynamics is unknown and highly complex.