We study stochastic zero-sum games on graphs, which are prevalent tools to model decision-making in presence of an antagonistic opponent in a random environment. In this setting, an important question is the one of strategy complexity: what kinds of strategies are sufficient or required to play optimally (e.g., randomization or memory requirements)? Our contributions further the understanding of arena-independent finite-memory (AIFM) determinacy, i.e., the study of objectives for which memory is needed, but in a way that only depends on limited parameters of the game graphs. First, we show that objectives for which pure AIFM strategies suffice to play optimally also admit pure AIFM subgame perfect strategies. Second, we show that we can reduce the study of objectives for which pure AIFM strategies suffice in two-player stochastic games to the easier study of one-player stochastic games (i.e., Markov decision processes). Third, we characterize the sufficiency of AIFM strategies through two intuitive properties of objectives. This work extends a line of research started on deterministic games to stochastic ones.
We study the convergence of best-response dynamics in Tullock contests with convex cost functions (these games always have a unique pure-strategy Nash equilibrium). We show that best-response dynamics rapidly converges to the equilibrium for homogeneous agents. For two homogeneous agents, we show convergence to an $\epsilon$-approximate equilibrium in $\Theta(\log\log(1/\epsilon))$ steps. For $n \ge 3$ agents, the dynamics is not unique because at each step $n-1 \ge 2$ agents can make non-trivial moves. We consider the model proposed by \cite{ghosh2023best}, where the agent making the move is randomly selected at each time step. We show convergence to an $\epsilon$-approximate equilibrium in $O(\beta \log(n/(\epsilon\delta)))$ steps with probability $1-\delta$, where $\beta$ is a parameter of the agent selection process, e.g., $\beta = n^2 \log(n)$ if agents are selected uniformly at random at each time step. We complement this result with a lower bound of $\Omega(n + \log(1/\epsilon)/\log(n))$ applicable for any agent selection process.
We show that a maximum likelihood approach for parameter estimation in agent-based models (ABMs) of opinion dynamics outperforms the typical simulation-based approach. Simulation-based approaches simulate the model repeatedly in search of a set of parameters that generates data similar enough to the observed one. In contrast, likelihood-based approaches derive a likelihood function that connects the unknown parameters to the observed data in a statistically principled way. We compare these two approaches on the well-known bounded-confidence model of opinion dynamics. We do so on three realistic scenarios of increasing complexity depending on data availability: (i) fully observed opinions and interactions, (ii) partially observed interactions, (iii) observed interactions with noisy proxies of the opinions. We highlight how identifying observed and latent variables is fundamental for connecting the model to the data. To realize the likelihood-based approach, we first cast the model into a probabilistic generative guise that supports a proper data likelihood. Then, we describe the three scenarios via probabilistic graphical models and show the nuances that go into translating the model. Finally, we implement the resulting probabilistic models in an automatic differentiation framework (PyTorch). This step enables easy and efficient maximum likelihood estimation via gradient descent. Our experimental results show that the maximum likelihood estimates are up to 4x more accurate and require up to 200x less computational time.
We study the memory requirements of Nash equilibria in turn-based multiplayer games on possibly infinite graphs with reachability, shortest path and B\"uchi objectives. We present constructions for finite-memory Nash equilibria in these games that apply to arbitrary game graphs, bypassing the finite-arena requirement that is central in existing approaches. We show that, for these three types of games, from any Nash equilibrium, we can derive another Nash equilibrium where all strategies are finite-memory such that the same players accomplish their objective, without increasing their cost for shortest path games. Furthermore, we provide memory bounds that are independent of the size of the game graph for reachability and shortest path games. These bounds depend only on the number of players. To the best of our knowledge, we provide the first results pertaining to finite-memory constrained Nash equilibria in infinite arenas and the first arena-independent memory bounds for Nash equilibria.
While discounted payoff games and classic games that reduce to them, like parity and mean-payoff games, are symmetric, their solutions are not. We have taken a fresh view on the constraints that optimal solutions need to satisfy, and devised a novel way to converge to them, which is entirely symmetric. It also challenges the gospel that methods for solving payoff games are either based on strategy improvement or on value iteration.
As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.
Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.
We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart. Code is made available at //github.com/facebookresearch/SlowFast
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.
Learning with limited data is a key challenge for visual recognition. Few-shot learning methods address this challenge by learning an instance embedding function from seen classes and apply the function to instances from unseen classes with limited labels. This style of transfer learning is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them is the target task. In this paper, we propose a novel approach to adapt the embedding model to the target classification task, yielding embeddings that are task-specific and are discriminative. To this end, we employ a type of self-attention mechanism called Transformer to transform the embeddings from task-agnostic to task-specific by focusing on relating instances from the test instances to the training instances in both seen and unseen classes. Our approach also extends to both transductive and generalized few-shot classification, two important settings that have essential use cases. We verify the effectiveness of our model on two standard benchmark few-shot classification datasets --- MiniImageNet and CUB, where our approach demonstrates state-of-the-art empirical performance.
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.