Multi-Agent Path Finding (MAPF) is a problem of finding a sequence of movements for agents to reach their assigned location without collision. Centralized algorithms usually give optimal solutions, but have difficulties to scale without employing various techniques - usually with a sacrifice of optimality; but solving MAPF problems with the number of agents greater than a thousand remains a challenge nevertheless. To tackle the scalability issue, we present DMAPF - a decentralized and distributed MAPF solver, which is a continuation of our recently published work, ros-dmapf. We address the issues of ros-dmapf where it (i) only works in maps without obstacles; and (ii) has a low success rate with dense maps. Given a MAPF problem, both ros-dmapf and DMAPF divide the map spatially into subproblems, but the latter further divides each subproblem into disconnected regions called areas. Each subproblem is assigned to a distributed solver, which then individually creates an abstract plan - a sequence of areas that an agent needs to visit - for each agent in it, and interleaves agent migration with movement planning. Answer Set Programming, which is known for its performance in small but complex problems, is used in many parts including problem division, abstract planning, border assignment for the migration, and movement planning. Robot Operating System is used to facilitate communication between the solvers and to enable the opportunity to integrate with robotic systems. DMAPF introduces a new interaction protocol between the solvers, and mechanisms that together result in a higher success rate and better solution quality without sacrificing much of the performance. We implement and experimentally validate DMAPF by comparing it with other state-of-the-art MAPF solvers and the results show that our system achieves better scalability.
Emergency vehicles (EMVs) play a critical role in a city's response to time-critical events such as medical emergencies and fire outbreaks. The existing approaches to reduce EMV travel time employ route optimization and traffic signal pre-emption without accounting for the coupling between route these two subproblems. As a result, the planned route often becomes suboptimal. In addition, these approaches also do not focus on minimizing disruption to the overall traffic flow. To address these issues, we introduce EMVLight in this paper. This is a decentralized reinforcement learning (RL) framework for simultaneous dynamic routing and traffic signal control. EMVLight extends Dijkstra's algorithm to efficiently update the optimal route for an EMV in real-time as it travels through the traffic network. Consequently, the decentralized RL agents learn network-level cooperative traffic signal phase strategies that reduce EMV travel time and the average travel time of non-EMVs in the network. We have carried out comprehensive experiments with synthetic and real-world maps to demonstrate this benefit. Our results show that EMVLight outperforms benchmark transportation engineering techniques as well as existing RL-based traffic signal control methods.
Placing robots outside controlled conditions requires versatile movement representations that allow robots to learn new tasks and adapt them to environmental changes. The introduction of obstacles or the placement of additional robots in the workspace, the modification of the joint range due to faults or range-of-motion constraints are typical cases where the adaptation capabilities play a key role for safely performing the robot's task. Probabilistic movement primitives (ProMPs) have been proposed for representing adaptable movement skills, which are modelled as Gaussian distributions over trajectories. These are analytically tractable and can be learned from a small number of demonstrations. However, both the original ProMP formulation and the subsequent approaches only provide solutions to specific movement adaptation problems, e.g., obstacle avoidance, and a generic, unifying, probabilistic approach to adaptation is missing. In this paper we develop a generic probabilistic framework for adapting ProMPs. We unify previous adaptation techniques, for example, various types of obstacle avoidance, via-points, mutual avoidance, in one single framework and combine them to solve complex robotic problems. Additionally, we derive novel adaptation techniques such as temporally unbound via-points and mutual avoidance. We formulate adaptation as a constrained optimisation problem where we minimise the Kullback-Leibler divergence between the adapted distribution and the distribution of the original primitive while we constrain the probability mass associated with undesired trajectories to be low. We demonstrate our approach on several adaptation problems on simulated planar robot arms and 7-DOF Franka-Emika robots in a dual robot arm setting.
Decentralized algorithm is a form of computation that achieves a global goal through local dynamics that relies on low-cost communication between directly-connected agents. On large-scale optimization tasks involving distributed datasets, decentralized algorithms have shown strong, sometimes superior, performance over distributed algorithms with a central node. Recently, developing decentralized algorithms for deep learning has attracted great attention. They are considered as low-communication-overhead alternatives to those using a parameter server or the Ring-Allreduce protocol. However, the lack of an easy-to-use and efficient software package has kept most decentralized algorithms merely on paper. To fill the gap, we introduce BlueFog, a python library for straightforward, high-performance implementations of diverse decentralized algorithms. Based on a unified abstraction of various communication operations, BlueFog offers intuitive interfaces to implement a spectrum of decentralized algorithms, from those using a static, undirected graph for synchronous operations to those using dynamic and directed graphs for asynchronous operations. BlueFog also adopts several system-level acceleration techniques to further optimize the performance on the deep learning tasks. On mainstream DNN training tasks, BlueFog reaches a much higher throughput and achieves an overall $1.2\times \sim 1.8\times$ speedup over Horovod, a state-of-the-art distributed deep learning package based on Ring-Allreduce. BlueFog is open source at //github.com/Bluefog-Lib/bluefog.
We introduce a novel edge tracing algorithm using Gaussian process regression. Our edge-based segmentation algorithm models an edge of interest using Gaussian process regression and iteratively searches the image for edge pixels in a recursive Bayesian scheme. This procedure combines local edge information from the image gradient and global structural information from posterior curves, sampled from the model's posterior predictive distribution, to sequentially build and refine an observation set of edge pixels. This accumulation of pixels converges the distribution to the edge of interest. Hyperparameters can be tuned by the user at initialisation and optimised given the refined observation set. This tunable approach does not require any prior training and is not restricted to any particular type of imaging domain. Due to the model's uncertainty quantification, the algorithm is robust to artefacts and occlusions which degrade the quality and continuity of edges in images. Our approach also has the ability to efficiently trace edges in image sequences by using previous-image edge traces as a priori information for consecutive images. Various applications to medical imaging and satellite imaging are used to validate the technique and comparisons are made with two commonly used edge tracing algorithms.
Federated learning, which solves the problem of data island by connecting multiple computational devices into a decentralized system, has become a promising paradigm for privacy-preserving machine learning. This paper studies vertical federated learning (VFL), which tackles the scenarios where collaborating organizations share the same set of users but disjoint features. Contemporary VFL methods are mainly used in static scenarios where the active party and the passive party have all the data from the beginning and will not change. However, the data in real life often changes dynamically. To alleviate this problem, we propose a new vertical federation learning method, DVFL, which adapts to dynamic data distribution changes through knowledge distillation. In DVFL, most of the computations are held locally to improve data security and model efficiency. Our extensive experimental results show that DVFL can not only obtain results close to existing VFL methods in static scenes, but also adapt to changes in data distribution in dynamic scenarios.
Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into individual agents' Q-values. Unlike previous works which restrict the representation relation of the individual Q-values and the global one, we leverage the integrated gradient attribution technique into deep MARL to directly decompose global Q-values along trajectory paths to assign credits for agents. We evaluate QPD on the challenging StarCraft II micromanagement tasks and show that QPD achieves the state-of-the-art performance in both homogeneous and heterogeneous multiagent scenarios compared with existing cooperative MARL algorithms.
Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.
The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.