Sprouts is a two-player pencil-and-paper game invented by John Conway and Michael Paterson in 1967. In the game, the players take turns in joining dots by curves according to simple rules, until one player cannot make a move. The game of Sprouts is very popular and simple-looking, so it may come as a surprise that there are essentially no AI Sprouts players available. This lack of computer opponents is caused by the fact that the game hides a surprisingly high combinatorial complexity and implementing it involves fascinating programming challenges. We overcome all the implementation barriers and create the first user-friendly Sprouts application with a strong artificial intelligence after more than 50 years of the existence of the game. In particular, we combine results from the theory of nimbers with new methods based on Delaunay triangulations and crossing-preserving force-directed algorithms to develop an AI Sprouts player which plays a perfect game on up to 11 spots.
Shortest path queries over graphs are usually considered as isolated tasks, where the goal is to return the shortest path for each individual query. In practice, however, such queries are typically part of a system (e.g., a road network) and their execution dynamically affects other queries and network parameters, such as the loads on edges, which in turn affects the shortest paths. We study the problem of collectively processing shortest path queries, where the objective is to optimize a collective objective, such as minimizing the overall cost. We define a temporal load-aware network that dynamically tracks expected loads while satisfying the desirable `first in, first out' property. We develop temporal load-aware extensions of widely used shortest path algorithms, and a scalable collective routing solution that seeks to reduce system-wide congestion through dynamic path reassignment. Experiments illustrate that our collective approach to this NP-hard problem achieves improvements in a variety of performance measures, such as, i) reducing average travel times by up to 63%, ii) producing fairer suggestions across queries, and iii) distributing load across up to 97% of a city's road network capacity. The proposed approach is generalizable, which allows it to be adapted for other concurrent query processing tasks over networks.
Individuality is essential in human society, which induces the division of labor and thus improves the efficiency and productivity. Similarly, it should also be the key to multi-agent cooperation. Inspired by that individuality is of being an individual separate from others, we propose a simple yet efficient method for the emergence of individuality (EOI) in multi-agent reinforcement learning (MARL). EOI learns a probabilistic classifier that predicts a probability distribution over agents given their observation and gives each agent an intrinsic reward of being correctly predicted by the classifier. The intrinsic reward encourages the agents to visit their own familiar observations, and learning the classifier by such observations makes the intrinsic reward signals stronger and the agents more identifiable. To further enhance the intrinsic reward and promote the emergence of individuality, two regularizers are proposed to increase the discriminability of the classifier. We implement EOI on top of popular MARL algorithms. Empirically, we show that EOI significantly outperforms existing methods in a variety of multi-agent cooperative scenarios.
In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: i) they do not require any prior tuning or knowledge of the game; ii) they all achieve O(\sqrt{T}) regret against arbitrary, adversarial opponents; and iii) they converge to the best response against convergent opponents. Also, if employed by all players, then iv) they guarantee O(1) social regret; while v) the induced sequence of play converges to Nash equilibrium with O(1) individual regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games).
With ever growing scale of neural models, knowledge distillation (KD) attracts more attention as a prominent tool for neural model compression. However, there are counter intuitive observations in the literature showing some challenging limitations of KD. A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD. Therefore, one important question would be how to find the best checkpoint of the teacher for distillation? Searching through the checkpoints of the teacher would be a very tedious and computationally expensive process, which we refer to as the \textit{checkpoint-search problem}. Moreover, another observation is that larger teachers might not necessarily be better teachers in KD which is referred to as the \textit{capacity-gap} problem. To address these challenging problems, in this work, we introduce our progressive knowledge distillation (Pro-KD) technique which defines a smoother training path for the student by following the training footprints of the teacher instead of solely relying on distilling from a single mature fully-trained teacher. We demonstrate that our technique is quite effective in mitigating the capacity-gap problem and the checkpoint search problem. We evaluate our technique using a comprehensive set of experiments on different tasks such as image classification (CIFAR-10 and CIFAR-100), natural language understanding tasks of the GLUE benchmark, and question answering (SQuAD 1.1 and 2.0) using BERT-based models and consistently got superior results over state-of-the-art techniques.
In the first part of this paper, we present a unified framework for analyzing the algorithmic complexity of any optimization problem, whether it be continuous or discrete in nature. This helps to formalize notions like "input", "size" and "complexity" in the context of general mathematical optimization, avoiding context dependent definitions which is one of the sources of difference in the treatment of complexity within continuous and discrete optimization. In the second part of the paper, we employ the language developed in the first part to study information theoretic and algorithmic complexity of {\em mixed-integer convex optimization}, which contains as a special case continuous convex optimization on the one hand and pure integer optimization on the other. We strive for the maximum possible generality in our exposition. We hope that this paper contains material that both continuous optimizers and discrete optimizers find new and interesting, even though almost all of the material presented is common knowledge in one or the other community. We see the main merit of this paper as bringing together all of this information under one unifying umbrella with the hope that this will act as yet another catalyst for more interaction across the continuous-discrete divide. In fact, our motivation behind Part I of the paper is to provide a common language for both communities.
An evolutionary approach for computing the winning strategy for Nim-like games is proposed in this paper. The winning strategy is computed by using the Multi Expression Programming (MEP) technique - a fast and efficient variant of the Genetic Programming (GP). Each play strategy is represented by a mathematical expression that contains mathematical operators (such as +, -, *, mod, div, and , or, xor, not) and operands (encoding the current game state). Several numerical experiments for computing the winning strategy for the Nim game are performed. The computational effort needed for evolving a winning strategy is reported. The results show that the proposed evolutionary approach is very suitable for computing the winning strategy for Nim-like games.
In the simplest formulation of the knapsack problem, one seeks to maximize the total value of a collection of objects such that the total weight remains below a certain limit. In this work, we move from computer science to physics and formulate the knapsack problem as a statistical physics system and compute the corresponding partition function. We approximate the result in the large number limit and from this approximation develop a new algorithm for the problem. We compare the performance of this algorithm to that of other approximation algorithms, finding that the new algorithm is faster than most of these approaches while still retaining high accuracy. From its speed and accuracy relationship, we argue that the algorithm is a manifestation of a greedy algorithm. We conclude by discussing ways to extend the formalism to make its underlying heuristics more rigorous or to apply the approach to other combinatorial optimization problems. In all, this work exists at the intersection between computer science and statistical physics and represents a new analytical approach to solving the problems in the former using methods of the latter.
Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization. However, these algorithms may require careful hyperparameter tuning for each problem instance. We use a reinforcement learning agent in conjunction with a quantum-inspired algorithm to solve the Ising energy minimization problem, which is equivalent to the Maximum Cut problem. The agent controls the algorithm by tuning one of its parameters with the goal of improving recently seen solutions. We propose a new Rescaled Ranked Reward (R3) method that enables stable single-player version of self-play training that helps the agent to escape local optima. The training on any problem instance can be accelerated by applying transfer learning from an agent trained on randomly generated problems. Our approach allows sampling high-quality solutions to the Ising problem with high probability and outperforms both baseline heuristics and a black-box hyperparameter optimization approach.
This paper describes the development of the Microsoft XiaoIce system, the most popular social chatbot in the world. XiaoIce is uniquely designed as an AI companion with an emotional connection to satisfy the human need for communication, affection, and social belonging. We take into account both intelligent quotient (IQ) and emotional quotient (EQ) in system design, cast human-machine social chat as decision-making over Markov Decision Processes (MDPs), and optimize XiaoIce for long-term user engagement, measured in expected Conversation-turns Per Session (CPS). We detail the system architecture and key components including dialogue manager, core chat, skills, and an empathetic computing module. We show how XiaoIce dynamically recognizes human feelings and states, understands user intents, and responds to user needs throughout long conversations. Since the release in 2014, XiaoIce has communicated with over 660 million users and succeeded in establishing long-term relationships with many of them. Analysis of large-scale online logs shows that XiaoIce has achieved an average CPS of 23, which is significantly higher than that of other chatbots and even human conversations.
Random walks are at the heart of many existing network embedding methods. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to vertex identity. In this work, we introduce the Role2Vec framework which uses the flexible notion of attributed random walks, and serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16:55% while requiring on average 853x less space than existing methods on a variety of graphs.