Experiments in predator-prey systems show the emergence of long-term cycles. Deterministic model typically fails in capturing these behaviors, which emerge from the microscopic interplay of individual based dynamics and stochastic effects. However, simulating stochastic individual based models can be extremely demanding, especially when the sample size is large. Hence we propose an alternative simulation approach, whose computation cost is lower than the one of the classic stochastic algorithms. First, we describe how starting from the individual description of predator-prey dynamics, it is possible to derive the mean-field equations for the homogeneous and heterogeneous space cases. Then, we see that the new approach is able to preserve the order and that it converges to the mean-field solutions as the sample size increases. We show how to simulate the dynamics with the new approach, performing different numerical experiments in order to test its efficiency. Finally, we analyze the different nature of oscillations between mean-field and stochastic simulations underling how the new algorithm can be useful also to study the collective behaviours at the population level.
Model-Based Reinforcement Learning involves learning a \textit{dynamics model} from data, and then using this model to optimise behaviour, most often with an online \textit{planner}. Much of the recent research along these lines presents a particular set of design choices, involving problem definition, model learning and planning. Given the multiple contributions, it is difficult to evaluate the effects of each. This paper sets out to disambiguate the role of different design choices for learning dynamics models, by comparing their performance to planning with a ground-truth model -- the simulator. First, we collect a rich dataset from the training sequence of a model-free agent on 5 domains of the DeepMind Control Suite. Second, we train feed-forward dynamics models in a supervised fashion, and evaluate planner performance while varying and analysing different model design choices, including ensembling, stochasticity, multi-step training and timestep size. Besides the quantitative analysis, we describe a set of qualitative findings, rules of thumb, and future research directions for planning with learned dynamics models. Videos of the results are available at //sites.google.com/view/learning-better-models.
Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but \emph{games} between such teams are less understood. We consider a general model of zero-sum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we provide bounds on the upper (min-max) and lower (max-min) values of the game. Furthermore, if the upper and lower values of the game are identical (i.e., if the game has a \emph{value}), our bounds coincide with the value of the game. Our bounds are obtained using two dynamic programs based on a sufficient statistic known as the common information belief (CIB). We also identify certain information structures in which only the minimizing team controls the evolution of the CIB. In these cases, we show that one of our CIB based dynamic programs can be used to find the min-max strategy (in addition to the min-max value). We propose an approximate dynamic programming approach for computing the values (and the strategy when applicable) and illustrate our results with the help of an example.
Dynamic graph embedding has gained great attention recently due to its capability of learning low dimensional graph representations for complex temporal graphs with high accuracy. However, recent advances mostly focus on learning node embeddings as deterministic "vectors" for static graphs yet disregarding the key graph temporal dynamics and the evolving uncertainties associated with node embedding in the latent space. In this work, we propose an efficient stochastic dynamic graph embedding method (DynG2G) that applies an inductive feed-forward encoder trained with node triplet-based contrastive loss. Every node per timestamp is encoded as a time-dependent probabilistic multivariate Gaussian distribution in the latent space, hence we can quantify the node embedding uncertainty on-the-fly. We adopted eight different benchmarks that represent diversity in size (from 96 nodes to 87,626 and from 13,398 edges to 4,870,863) and diversity in dynamics. We demonstrate via extensive experiments on these eight dynamic graph benchmarks that DynG2G achieves new state-of-the-art performance in capturing the underlying temporal node embeddings. We also demonstrate that DynG2G can predict the evolving node embedding uncertainty, which plays a crucial role in quantifying the intrinsic dimensionality of the dynamical system over time. We obtain a universal relation of the optimal embedding dimension, $L_o$, versus the effective dimensionality of uncertainty, $D_u$, and we infer that $L_o=D_u$ for all cases. This implies that the uncertainty quantification approach we employ in the DynG2G correctly captures the intrinsic dimensionality of the dynamics of such evolving graphs despite the diverse nature and composition of the graphs at each timestamp. Moreover, this $L_0 - D_u$ correlation provides a clear path to select adaptively the optimum embedding size at each timestamp by setting $L \ge D_u$.
Many frameworks exist to infer cause and effect relations in complex nonlinear systems but a complete theory is lacking. A new framework is presented that is fully nonlinear, provides a complete information theoretic disentanglement of causal processes, allows for nonlinear interactions between causes, identifies the causal strength of missing or unknown processes, and can analyze systems that cannot be represented on Directed Acyclic Graphs. The basic building blocks are information theoretic measures such as (conditional) mutual information and a new concept called certainty that monotonically increases with the information available about the target process. The framework is presented in detail and compared with other existing frameworks, and the treatment of confounders is discussed. While there are systems with structures that the framework cannot disentangle, it is argued that any causal framework that is based on integrated quantities will miss out potentially important information of the underlying probability density functions. The framework is tested on several highly simplified stochastic processes to demonstrate how blocking and gateways are handled, and on the chaotic Lorentz 1963 system. We show that the framework provides information on the local dynamics, but also reveals information on the larger scale structure of the underlying attractor. Furthermore, by applying it to real observations related to the El-Nino-Southern-Oscillation system we demonstrate its power and advantage over other methodologies.
Community detection in graphs that are generated according to stochastic block models (SBMs) has received much attention lately. In this paper, we focus on the binary symmetric SBM -- in which a graph of $n$ vertices is randomly generated by first partitioning the vertices into two equal-sized communities and then connecting each pair of vertices with probability that depends on their community memberships -- and study the associated exact community recovery problem. Although the maximum-likelihood formulation of the problem is non-convex and discrete, we propose to tackle it using a popular iterative method called projected power iterations. To ensure fast convergence of the method, we initialize it using a point that is generated by another iterative method called orthogonal iterations, which is a classic method for computing invariant subspaces of a symmetric matrix. We show that in the logarithmic sparsity regime of the problem, with high probability the proposed two-stage method can exactly recover the two communities down to the information-theoretic limit in $\mathcal{O}(n\log^2n/\log\log n)$ time, which is competitive with a host of existing state-of-the-art methods that have the same recovery performance. We also conduct numerical experiments on both synthetic and real data sets to demonstrate the efficacy of our proposed method and complement our theoretical development.
The Conditional Preference Network (CP-net) graphically represents user's qualitative and conditional preference statements under the ceteris paribus interpretation. The constrained CP-net is an extension of the CP-net, to a set of constraints. The existing algorithms for solving the constrained CP-net require the expensive dominance testing operation. We propose three approaches to tackle this challenge. In our first solution, we alter the constrained CP-net by eliciting additional relative importance statements between variables, in order to have a total order over the outcomes. We call this new model, the constrained Relative Importance Network (constrained CPR-net). Consequently, We show that the Constrained CPR-net has one single optimal outcome (assuming the constrained CPR-net is consistent) that we can obtain without dominance testing. In our second solution, we extend the Lexicographic Preference Tree (LP-tree) to a set of constraints. Then, we propose a recursive backtrack search algorithm, that we call Search-LP, to find the most preferable outcome. We prove that the first feasible outcome returned by Search-LP (without dominance testing) is also preferable to any other feasible outcome. Finally, in our third solution, we preserve the semantics of the CP-net and propose a divide and conquer algorithm that compares outcomes according to dominance testing.
Cooperative multi-agent reinforcement learning is a decentralized paradigm in sequential decision making where agents distributed over a network iteratively collaborate with neighbors to maximize global (network-wide) notions of rewards. Exact computations typically involve a complexity that scales exponentially with the number of agents. To address this curse of dimensionality, we design a scalable algorithm based on the Natural Policy Gradient framework that uses local information and only requires agents to communicate with neighbors within a certain range. Under standard assumptions on the spatial decay of correlations for the transition dynamics of the underlying Markov process and the localized learning policy, we show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity, incurring a localization error that does not depend on the number of agents and converges to zero exponentially fast as a function of the range of communication.
We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.
Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.
We study the problem of stock related question answering (StockQA): automatically generating answers to stock related questions, just like professional stock analysts providing action recommendations to stocks upon user's requests. StockQA is quite different from previous QA tasks since (1) the answers in StockQA are natural language sentences (rather than entities or values) and due to the dynamic nature of StockQA, it is scarcely possible to get reasonable answers in an extractive way from the training data; and (2) StockQA requires properly analyzing the relationship between keywords in QA pair and the numerical features of a stock. We propose to address the problem with a memory-augmented encoder-decoder architecture, and integrate different mechanisms of number understanding and generation, which is a critical component of StockQA. We build a large-scale Chinese dataset containing over 180K StockQA instances, based on which various technique combinations are extensively studied and compared. Experimental results show that a hybrid word-character model with separate character components for number processing, achieves the best performance.\footnote{The data is publicly available at \url{//ai.tencent.com/ailab/nlp/dataset/}.}