In this paper, online game is studied, where at each time, a group of players aim at selfishly minimizing their own time-varying cost function simultaneously subject to time-varying coupled constraints and local feasible set constraints. Only local cost functions and local constraints are available to individual players, who can share limited information with their neighbors through a fixed and connected graph. In addition, players have no prior knowledge of future cost functions and future local constraint functions. In this setting, a novel decentralized online learning algorithm is devised based on mirror descent and a primal-dual strategy. The proposed algorithm can achieve sublinearly bounded regrets and constraint violation by appropriately choosing decaying stepsizes. Furthermore, it is shown that the generated sequence of play by the designed algorithm can converge to the variational GNE of a strongly monotone game, to which the online game converges. Additionally, a payoff-based case, i.e., in a bandit feedback setting, is also considered and a new payoff-based learning policy is devised to generate sublinear regrets and constraint violation. Finally, the obtained theoretical results are corroborated by numerical simulations.
Two-player games are a fruitful way to represent and reason about several important synthesis tasks. These tasks include controller synthesis (where one asks for a controller for a given plant such that the controlled plant satisfies a given temporal specification), program repair (setting values of variables to avoid exceptions), and synchronization synthesis (adding lock/unlock statements in multi-threaded programs to satisfy safety assertions). In all these applications, a solution directly corresponds to a winning strategy for one of the players in the induced game. In turn, \emph{logically-specified} games offer a powerful way to model these tasks for large or infinite-state systems. Much of the techniques proposed for solving such games typically rely on abstraction-refinement or template-based solutions. In this paper, we show how to apply classical fixpoint algorithms, that have hitherto been used in explicit, finite-state, settings, to a symbolic logical setting. We implement our techniques in a tool called GenSys-LTL and show that they are not only effective in synthesizing valid controllers for a variety of challenging benchmarks from the literature, but often compute maximal winning regions and maximally-permissive controllers. We achieve \textbf{46.38X speed-up} over the state of the art and also scale well for non-trivial LTL specifications.
Gradient dominance property is a condition weaker than strong convexity, yet it sufficiently ensures global convergence for first-order methods even in non-convex optimization. This property finds application in various machine learning domains, including matrix decomposition, linear neural networks, and policy-based reinforcement learning (RL). In this paper, we study the stochastic homogeneous second-order descent method (SHSODM) for gradient-dominated optimization with $\alpha \in [1, 2]$ based on a recently proposed homogenization approach. Theoretically, we show that SHSODM achieves a sample complexity of $O(\epsilon^{-7/(2 \alpha) +1})$ for $\alpha \in [1, 3/2)$ and $\tilde{O}(\epsilon^{-2/\alpha})$ for $\alpha \in [3/2, 2]$. We further provide a SHSODM with a variance reduction technique enjoying an improved sample complexity of $O( \epsilon ^{-( 7-3\alpha ) /( 2\alpha )})$ for $\alpha \in [1,3/2)$. Our results match the state-of-the-art sample complexity bounds for stochastic gradient-dominated optimization without \emph{cubic regularization}. Since the homogenization approach only relies on solving extremal eigenvector problems instead of Newton-type systems, our methods gain the advantage of cheaper iterations and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the efficiency of SHSODM compared to other off-the-shelf methods.
Interacting systems of events may exhibit cascading behavior where events tend to be temporally clustered. While the cascades themselves may be obvious from the data, it is important to understand which states of the system trigger them. For this purpose, we propose a modeling framework based on continuous-time Bayesian networks (CTBNs) to analyze cascading behavior in complex systems. This framework allows us to describe how events propagate through the system and to identify likely sentry states, that is, system states that may lead to imminent cascading behavior. Moreover, CTBNs have a simple graphical representation and provide interpretable outputs, both of which are important when communicating with domain experts. We also develop new methods for knowledge extraction from CTBNs and we apply the proposed methodology to a data set of alarms in a large industrial system.
While advancing rapidly, Artificial Intelligence still falls short of human intelligence in several key aspects due to inherent limitations in current AI technologies and our understanding of cognition. Humans have an innate ability to understand context, nuances, and subtle cues in communication, which allows us to comprehend jokes, sarcasm, and metaphors. Machines struggle to interpret such contextual information accurately. Humans possess a vast repository of common-sense knowledge that helps us make logical inferences and predictions about the world. Machines lack this innate understanding and often struggle with making sense of situations that humans find trivial. In this article, we review the prospective Machine Intelligence candidates, a review from Prof. Yann LeCun, and other work that can help close this gap between human and machine intelligence. Specifically, we talk about what's lacking with the current AI techniques such as supervised learning, reinforcement learning, self-supervised learning, etc. Then we show how Hierarchical planning-based approaches can help us close that gap and deep-dive into energy-based, latent-variable methods and Joint embedding predictive architecture methods.
Reciprocity, or the tendency of individuals to mirror behavior, is a key measure that describes information exchange in a social network. Users in social networks tend to engage in different levels of reciprocal behavior. Differences in such behavior may indicate the existence of communities that reciprocate links at varying rates. In this paper, we develop methodology to model the diverse reciprocal behavior in growing social networks. In particular, we present a preferential attachment model with heterogeneous reciprocity that imitates the attraction users have for popular users, plus the heterogeneous nature by which they reciprocate links. We compare Bayesian and frequentist model fitting techniques for large networks, as well as computationally efficient variational alternatives. Cases where the number of communities are known and unknown are both considered. We apply the presented methods to the analysis of a Facebook wallpost network where users have non-uniform reciprocal behavior patterns. The fitted model captures the heavy-tailed nature of the empirical degree distributions in the Facebook data and identifies multiple groups of users that differ in their tendency to reply to and receive responses to wallposts.
Learning in games considers how multiple agents maximize their own rewards through repeated games. Memory, an ability that an agent changes his/her action depending on the history of actions in previous games, is often introduced into learning to explore more clever strategies and discuss the decision-making of real agents like humans. However, such games with memory are hard to analyze because they exhibit complex phenomena like chaotic dynamics or divergence from Nash equilibrium. In particular, how asymmetry in memory capacities between agents affects learning in games is still unclear. In response, this study formulates a gradient ascent algorithm in games with asymmetry memory capacities. To obtain theoretical insights into learning dynamics, we first consider a simple case of zero-sum games. We observe complex behavior, where learning dynamics draw a heteroclinic connection from unstable fixed points to stable ones. Despite this complexity, we analyze learning dynamics and prove local convergence to these stable fixed points, i.e., the Nash equilibria. We identify the mechanism driving this convergence: an agent with a longer memory learns to exploit the other, which in turn endows the other's utility function with strict concavity. We further numerically observe such convergence in various initial strategies, action numbers, and memory lengths. This study reveals a novel phenomenon due to memory asymmetry, providing fundamental strides in learning in games and new insights into computing equilibria.
The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $\epsilon$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/\epsilon^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.
Anomaly segmentation plays a crucial role in identifying anomalous objects within images, which facilitates the detection of road anomalies for autonomous driving. Although existing methods have shown impressive results in anomaly segmentation using synthetic training data, the domain discrepancies between synthetic training data and real test data are often neglected. To address this issue, the Multi-Granularity Cross-Domain Alignment (MGCDA) framework is proposed for anomaly segmentation in complex driving environments. It uniquely combines a new Multi-source Domain Adversarial Training (MDAT) module and a novel Cross-domain Anomaly-aware Contrastive Learning (CACL) method to boost the generality of the model, seamlessly integrating multi-domain data at both scene and sample levels. Multi-source domain adversarial loss and a dynamic label smoothing strategy are integrated into the MDAT module to facilitate the acquisition of domain-invariant features at the scene level, through adversarial training across multiple stages. CACL aligns sample-level representations with contrastive loss on cross-domain data, which utilizes an anomaly-aware sampling strategy to efficiently sample hard samples and anchors. The proposed framework has decent properties of parameter-free during the inference stage and is compatible with other anomaly segmentation networks. Experimental conducted on Fishyscapes and RoadAnomaly datasets demonstrate that the proposed framework achieves state-of-the-art performance.
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.
Learning with limited data is a key challenge for visual recognition. Few-shot learning methods address this challenge by learning an instance embedding function from seen classes and apply the function to instances from unseen classes with limited labels. This style of transfer learning is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them is the target task. In this paper, we propose a novel approach to adapt the embedding model to the target classification task, yielding embeddings that are task-specific and are discriminative. To this end, we employ a type of self-attention mechanism called Transformer to transform the embeddings from task-agnostic to task-specific by focusing on relating instances from the test instances to the training instances in both seen and unseen classes. Our approach also extends to both transductive and generalized few-shot classification, two important settings that have essential use cases. We verify the effectiveness of our model on two standard benchmark few-shot classification datasets --- MiniImageNet and CUB, where our approach demonstrates state-of-the-art empirical performance.