In mobile robotics, area exploration and coverage are critical capabilities. In most of the available research, a common assumption is global, long-range communication and centralised cooperation. This paper proposes a novel swarm-based coverage control algorithm that relaxes these assumptions. The algorithm combines two elements: swarm rules and frontier search algorithms. Inspired by natural systems in which large numbers of simple agents (e.g., schooling fish, flocking birds, swarming insects) perform complicated collective behaviors, the first element uses three simple rules to maintain a swarm formation in a distributed manner. The second element provides means to select promising regions to explore (and cover) using the minimization of a cost function involving the agent's relative position to the frontier cells and the frontier's size. We tested our approach's performance on both heterogeneous and homogeneous groups of mobile robots in different environments. We measure both coverage performance and swarm formation statistics that permit the group to maintain communication. Through a series of comparison experiments, we demonstrate the proposed strategy has superior performance over recently presented map coverage methodologies and the conventional artificial potential field based on a percentage of cell-coverage, turnaround, and safe paths while maintaining a formation that permits short-range communication.
This paper considers the distributed online convex optimization problem with time-varying constraints over a network of agents. This is a sequential decision making problem with two sequences of arbitrarily varying convex loss and constraint functions. At each round, each agent selects a decision from the decision set, and then only a portion of the loss function and a coordinate block of the constraint function at this round are privately revealed to this agent. The goal of the network is to minimize the network-wide loss accumulated over time. Two distributed online algorithms with full-information and bandit feedback are proposed. Both dynamic and static network regret bounds are analyzed for the proposed algorithms, and network cumulative constraint violation is used to measure constraint violation, which excludes the situation that strictly feasible constraints can compensate the effects of violated constraints. In particular, we show that the proposed algorithms achieve $\mathcal{O}(T^{\max\{\kappa,1-\kappa\}})$ static network regret and $\mathcal{O}(T^{1-\kappa/2})$ network cumulative constraint violation, where $T$ is the time horizon and $\kappa\in(0,1)$ is a user-defined trade-off parameter. Moreover, if the loss functions are strongly convex, then the static network regret bound can be reduced to $\mathcal{O}(T^{\kappa})$. Finally, numerical simulations are provided to illustrate the effectiveness of the theoretical results.
Our goal is to develop theory and algorithms for establishing fundamental limits on performance for a given task imposed by a robot's sensors. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper bound on the highest achievable expected reward for one-step decision making tasks. We then extend this bound to multi-step problems via a dynamic programming approach. We present algorithms for numerically computing the resulting bounds, and demonstrate our approach on three examples: (i) the lava problem from the literature on partially observable Markov decision processes, (ii) an example with continuous state and observation spaces corresponding to a robot catching a freely-falling object, and (iii) obstacle avoidance using a depth sensor with non-Gaussian noise. We demonstrate the ability of our approach to establish strong limits on achievable performance for these problems by comparing our upper bounds with achievable lower bounds (computed by synthesizing or learning concrete control policies).
We extend the recently introduced framework of metric distortion to multiwinner voting. In this framework, $n$ agents and $m$ alternatives are located in an underlying metric space. The exact distances between agents and alternatives are unknown. Instead, each agent provides a ranking of the alternatives, ordered from the closest to the farthest. Typically, the goal is to select a single alternative that approximately minimizes the total distance from the agents, and the worst-case approximation ratio is termed distortion. In the case of multiwinner voting, the goal is to select a committee of $k$ alternatives that (approximately) minimizes the total cost to all agents. We consider the scenario where the cost of an agent for a committee is her distance from the $q$-th closest alternative in the committee. We reveal a surprising trichotomy on the distortion of multiwinner voting rules in terms of $k$ and $q$: The distortion is unbounded when $q \leq k/3$, asymptotically linear in the number of agents when $k/3 < q \leq k/2$, and constant when $q > k/2$.
Device-to-device (D2D) communications is one of the key emerging technologies for the fifth generation (5G) networks and beyond. It enables direct communication between mobile users and thereby extends coverage for devices lacking direct access to the cellular infrastructure and hence enhances network capacity. D2D networks are complex, highly dynamic and will be strongly augmented by intelligence for decision making at both the edge and core of the network, which makes them particularly difficult to predict and analyze. Conventionally, D2D systems are evaluated, investigated and analyzed using analytical and probabilistic models (e.g., from stochastic geometry). However, applying classical simulation and analytical tools to such a complex system is often hard to track and inaccurate. In this paper, we present a modeling and simulation framework from the perspective of complex-systems science and exhibit an agent-based model for the simulation of D2D coverage extensions. We also present a theoretical study to benchmark our proposed approach for a basic scenario that is less complicated to model mathematically. Our simulation results show that we are indeed able to predict coverage extensions for multi-hop scenarios and quantify the effects of street-system characteristics and pedestrian mobility on the connection time of devices to the base station (BS). To our knowledge, this is the first study that applies agent-based simulations for coverage extensions in D2D.
Distributed Mean Estimation (DME) is a central building block in federated learning, where clients send local gradients to a parameter server for averaging and updating the model. Due to communication constraints, clients often use lossy compression techniques to compress the gradients, resulting in estimation inaccuracies. DME is more challenging when clients have diverse network conditions, such as constrained communication budgets and packet losses. In such settings, DME techniques often incur a significant increase in the estimation error leading to degraded learning performance. In this work, we propose a robust DME technique named EDEN that naturally handles heterogeneous communication budgets and packet losses. We derive appealing theoretical guarantees for EDEN and evaluate it empirically. Our results demonstrate that EDEN consistently improves over state-of-the-art DME techniques.
The framework of Simulation-to-real learning, i.e, learning policies in simulation and transferring those policies to the real world is one of the most promising approaches towards data-efficient learning in robotics. However, due to the inevitable reality gap between the simulation and the real world, a policy learned in the simulation may not always generate a safe behaviour on the real robot. As a result, during adaptation of the policy in the real world, the robot may damage itself or cause harm to its surroundings. In this work, we introduce a novel learning algorithm called SafeAPT that leverages a diverse repertoire of policies evolved in the simulation and transfers the most promising safe policy to the real robot through episodic interaction. To achieve this, SafeAPT iteratively learns a probabilistic reward model as well as a safety model using real-world observations combined with simulated experiences as priors. Then, it performs Bayesian optimization on the repertoire with the reward model while maintaining the specified safety constraint using the safety model. SafeAPT allows a robot to adapt to a wide range of goals safely with the same repertoire of policies evolved in the simulation. We compare SafeAPT with several baselines, both in simulated and real robotic experiments and show that SafeAPT finds high-performance policies within a few minutes in the real world while minimizing safety violations during the interactions.
Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.
The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.
Cloud Robotics is one of the emerging area of robotics. It has created a lot of attention due to its direct practical implications on Robotics. In Cloud Robotics, the concept of cloud computing is used to offload computational extensive jobs of the robots to the cloud. Apart from this, additional functionalities can also be offered on run to the robots on demand. Simultaneous Localization and Mapping (SLAM) is one of the computational intensive algorithm in robotics used by robots for navigation and map building in an unknown environment. Several Cloud based frameworks are proposed specifically to address the problem of SLAM, DAvinCi, Rapyuta and C2TAM are some of those framework. In this paper, we presented a detailed review of all these framework implementation for SLAM problem.