Altruistic cooperation is costly yet socially desirable. As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL). Indirect reciprocity, where agents consider their interaction partner's reputation, has been shown to stabilise cooperation in homogeneous, idealised populations. However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities. We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information. We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria. We observe that a defecting majority leads the minority group to defect, but not the inverse. Moreover, changing the norms that judge in and out-group interactions can steer a system towards either fair or unfair cooperation. This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms. Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.
We consider a decentralized optimization problem for networks affected by communication delays. Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems. To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs. This motivates investigating decentralized optimization solutions on directed graphs. Existing solutions assume nodes know their out-degrees, resulting in limited applicability. To overcome this limitation, we introduce a novel gossip-based algorithm, called DT-GO, that does not need to know the out-degrees. The algorithm is applicable in general directed networks, for example networks with delays or limited acknowledgment capabilities. We derive convergence rates for both convex and non-convex objectives, showing that our algorithm achieves the same complexity order as centralized Stochastic Gradient Descent. In other words, the effects of the graph topology and delays are confined to higher-order terms. Additionally, we extend our analysis to accommodate time-varying network topologies. Numerical simulations are provided to support our theoretical findings.
When selecting multiple candidates based on approval preferences of agents, the proportional representation of agents' opinions is an important and well-studied desideratum. Existing criteria for evaluating the representativeness of outcomes focus on groups of agents and demand that sufficiently large and cohesive groups are ''represented'' in the sense that candidates approved by some group members are selected. Crucially, these criteria say nothing about the representation of individual agents, even if these agents are members of groups that deserve representation. In this paper, we formalize the concept of individual representation (IR) and explore to which extent, and under which circumstances, it can be achieved. We show that checking whether an IR outcome exists is computationally intractable, and we verify that all common approval-based voting rules may fail to provide IR even in cases where this is possible. We then focus on domain restrictions and establish an interesting contrast between ''voter interval'' and ''candidate interval'' preferences. This contrast can also be observed in our experimental results, where we analyze the attainability of IR for realistic preference profiles.
Understanding the dose-response relation between a continuous treatment and the outcome for an individual can greatly drive decision-making, particularly in areas like personalized drug dosing and personalized healthcare interventions. Point estimates are often insufficient in these high-risk environments, highlighting the need for uncertainty quantification to support informed decisions. Conformal prediction, a distribution-free and model-agnostic method for uncertainty quantification, has seen limited application in continuous treatments or dose-response models. To address this gap, we propose a novel methodology that frames the causal dose-response problem as a covariate shift, leveraging weighted conformal prediction. By incorporating propensity estimation, conformal predictive systems, and likelihood ratios, we present a practical solution for generating prediction intervals for dose-response models. Additionally, our method approximates local coverage for every treatment value by applying kernel functions as weights in weighted conformal prediction. Finally, we use a new synthetic benchmark dataset to demonstrate the significance of covariate shift assumptions in achieving robust prediction intervals for dose-response models.
Addressing multiagent decision problems in AI, especially those involving collaborative or competitive agents acting concurrently in a partially observable and stochastic environment, remains a formidable challenge. While Interactive Dynamic Influence Diagrams~(I-DIDs) have offered a promising decision framework for such problems, they encounter limitations when the subject agent encounters unknown behaviors exhibited by other agents that are not explicitly modeled within the I-DID. This can lead to sub-optimal responses from the subject agent. In this paper, we propose a novel data-driven approach that utilizes an encoder-decoder architecture, particularly a variational autoencoder, to enhance I-DID solutions. By integrating a perplexity-based tree loss function into the optimization algorithm of the variational autoencoder, coupled with the advantages of Zig-Zag One-Hot encoding and decoding, we generate potential behaviors of other agents within the I-DID that are more likely to contain their true behaviors, even from limited interactions. This new approach enables the subject agent to respond more appropriately to unknown behaviors, thus improving its decision quality. We empirically demonstrate the effectiveness of the proposed approach in two well-established problem domains, highlighting its potential for handling multi-agent decision problems with unknown behaviors. This work is the first time of using neural networks based approaches to deal with the I-DID challenge in agent planning and learning problems.
In diverse professional environments, ranging from academic conferences to corporate earnings calls, the ability to anticipate audience questions stands paramount. Traditional methods, which rely on manual assessment of an audience's background, interests, and subject knowledge, often fall short - particularly when facing large or heterogeneous groups, leading to imprecision and inefficiency. While NLP has made strides in text-based question generation, its primary focus remains on academic settings, leaving the intricate challenges of professional domains, especially earnings call conferences, underserved. Addressing this gap, our paper pioneers the multi-question generation (MQG) task specifically designed for earnings call contexts. Our methodology involves an exhaustive collection of earnings call transcripts and a novel annotation technique to classify potential questions. Furthermore, we introduce a retriever-enhanced strategy to extract relevant information. With a core aim of generating a spectrum of potential questions that analysts might pose, we derive these directly from earnings call content. Empirical evaluations underscore our approach's edge, revealing notable excellence in the accuracy, consistency, and perplexity of the questions generated.
Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating autonomous agents based on LLMs. To harness the full potential of LLMs, researchers have devised diverse agent architectures tailored to different applications. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of autonomous agents from a holistic perspective. More specifically, our focus lies in the construction of LLM-based agents, for which we propose a unified framework that encompasses a majority of the previous work. Additionally, we provide a summary of the various applications of LLM-based AI agents in the domains of social science, natural science, and engineering. Lastly, we discuss the commonly employed evaluation strategies for LLM-based AI agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository for the related references at //github.com/Paitesanshi/LLM-Agent-Survey.
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
Hyperproperties are commonly used in computer security to define information-flow policies and other requirements that reason about the relationship between multiple computations. In this paper, we study a novel class of hyperproperties where the individual computation paths are chosen by the strategic choices of a coalition of agents in a multi-agent system. We introduce HyperATL*, an extension of computation tree logic with path variables and strategy quantifiers. Our logic can express strategic hyperproperties, such as that the scheduler in a concurrent system has a strategy to avoid information leakage. HyperATL* is particularly useful to specify asynchronous hyperproperties, i.e., hyperproperties where the speed of the execution on the different computation paths depends on the choices of the scheduler. Unlike other recent logics for the specification of asynchronous hyperproperties, our logic is the first to admit decidable model checking for the full logic. We present a model checking algorithm for HyperATL* based on alternating automata, and show that our algorithm is asymptotically optimal by providing a matching lower bound. We have implemented a prototype model checker for a fragment of HyperATL*, able to check various security properties on small programs.
A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.