In today's world, circumstances, processes, and requirements for software systems are becoming increasingly complex. In order to operate properly in such dynamic environments, software systems must adapt to these changes, which has led to the research area of Self-Adaptive Systems (SAS). Platooning is one example of adaptive systems in Intelligent Transportation Systems, which is the ability of vehicles to travel with close inter-vehicle distances. This technology leads to an increase in road throughput and safety, which directly addresses the increased infrastructure needs due to increased traffic on the roads. However, the No-Free-Lunch theorem states that the performance of one platooning coordination strategy is not necessarily transferable to other problems. Moreover, especially in the field of SAS, the selection of the most appropriate strategy depends on the current situation of the system. In this paper, we address the problem of self-aware optimization of adaptation planning strategies by designing a framework that includes situation detection, strategy selection, and parameter optimization of the selected strategies. We apply our approach on the case study platooning coordination and evaluate the performance of the proposed framework.
We study population protocols, a model of distributed computing appropriate for modeling well-mixed chemical reaction networks and other physical systems where agents exchange information in pairwise interactions, but have no control over their schedule of interaction partners. The well-studied *majority* problem is that of determining in an initial population of $n$ agents, each with one of two opinions $A$ or $B$, whether there are more $A$, more $B$, or a tie. A *stable* protocol solves this problem with probability 1 by eventually entering a configuration in which all agents agree on a correct consensus decision of $\mathsf{A}$, $\mathsf{B}$, or $\mathsf{T}$, from which the consensus cannot change. We describe a protocol that solves this problem using $O(\log n)$ states ($\log \log n + O(1)$ bits of memory) and optimal expected time $O(\log n)$. The number of states $O(\log n)$ is known to be optimal for the class of polylogarithmic time stable protocols that are "output dominant" and "monotone". These are two natural constraints satisfied by our protocol, making it simultaneously time- and state-optimal for that class. We introduce a key technique called a "fixed resolution clock" to achieve partial synchronization. Our protocol is *nonuniform*: the transition function has the value $\left \lceil {\log n} \right \rceil$ encoded in it. We show that the protocol can be modified to be uniform, while increasing the state complexity to $\Theta(\log n \log \log n)$.
The emergence of diverse network applications demands more flexible and responsive resource allocation for networks. Network slicing is a key enabling technology that provides each network service with a tailored set of network resources to satisfy specific service requirements. The focus of this paper is the network slicing of access networks realized by Passive Optical Networks (PONs). This paper proposes a learning-based Dynamic Bandwidth Allocation (DBA) algorithm for PON access networks, considering slice-awareness, demand-responsiveness, and allocation fairness. Our online convex optimization-based algorithm learns the implicit traffic trend over time and determines the most robust window allocation that reduces the average latency. Our simulation results indicate that the proposed algorithm reduces the average latency by prioritizing delay-sensitive and heavily-loaded ONUs while guaranteeing a minimal window allocation to all ONUs.
Many democratic political parties hold primary elections, which nicely reflects their democratic nature and promote, among other things, the democratic value of inclusiveness. However, the methods currently used for holding such primary elections may not be the most suitable, especially if some form of proportional ranking is desired. In this paper, we compare different algorithmic methods for holding primaries (i.e., different aggregation methods for voters' ballots), by evaluating the degree of proportional ranking that is achieved by each of them using real-world data. In particular, we compare six different algorithms by analyzing real-world data from a recent primary election conducted by the Israeli Democratit party. Technically, we analyze unique voter data and evaluate the proportionality achieved by means of cluster analysis, aiming at pinpointing the representation that is granted to different voter groups under each of the algorithmic methods considered. Our finding suggest that, contrary to the most-prominent primaries algorithm used (i.e., Approval), other methods such as Sequential Proportional Approval or Phragmen can bring about better proportional ranking and thus may be better suited for primary elections in practice.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.
Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin.
Reinforcement learning (RL) algorithms have been around for decades and been employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that demand multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.
Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen "robustly": commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.
During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in these family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods have emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets.
Being intensively studied, visual object tracking has witnessed great advances in either speed (e.g., with correlation filters) or accuracy (e.g., with deep features). Real-time and high accuracy tracking algorithms, however, remain scarce. In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity of multi-thread techniques and borrowing ideas from the success of parallel tracking and mapping in visual SLAM. The proposed PTAV framework is typically composed of two components, a (base) tracker T and a verifier V, working in parallel on two separate threads. The tracker T aims to provide a super real-time tracking inference and is expected to perform well most of the time; by contrast, the verifier V validates the tracking results and corrects T when needed. The key innovation is that, V does not work on every frame but only upon the requests from T; on the other end, T may adjust the tracking according to the feedback from V. With such collaboration, PTAV enjoys both the high efficiency provided by T and the strong discriminative power by V. Meanwhile, to adapt V to object appearance changes over time, we maintain a dynamic target template pool for adaptive verification, resulting in further performance improvements. In our extensive experiments on popular benchmarks including OTB2015, TC128, UAV20L and VOT2016, PTAV achieves the best tracking accuracy among all real-time trackers, and in fact even outperforms many deep learning based algorithms. Moreover, as a general framework, PTAV is very flexible with great potentials for future improvement and generalization.
The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.