We study a problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms to maximize their payoff. We consider Bayesian agents who are unaware of ex-post realization of their own arms' mean rewards, which is the first to study Bayesian extension of Shin et al. (2022). This extension presents significant challenges in analyzing equilibrium, in contrast to the fully-informed setting by Shin et al. (2022) under which the problem simply reduces to a case where each agent only has a single arm. With Bayesian agents, even in a single-agent setting, analyzing the replication-proofness of an algorithm becomes complicated. Remarkably, we first show that the algorithm proposed by Shin et al. (2022), defined H-UCB, is no longer replication-proof for any exploration parameters. Then, we provide sufficient and necessary conditions for an algorithm to be replication-proof in the single-agent setting. These results centers around several analytical results in comparing the expected regret of multiple bandit instances, which might be of independent interest. We further prove that exploration-then-commit (ETC) algorithm satisfies these properties, whereas UCB does not, which in fact leads to the failure of being replication-proof. We expand this result to multi-agent setting, and provide a replication-proof algorithm for any problem instance. The proof mainly relies on the single-agent result, as well as some structural properties of ETC and the novel introduction of a restarting round, which largely simplifies the analysis while maintaining the regret unchanged (up to polylogarithmic factor). We finalize our result by proving its sublinear regret upper bound, which matches that of H-UCB.
We introduce a multivariate local-linear estimator for multivariate regression discontinuity designs in which treatment is assigned by crossing a boundary in the space of running variables. The dominant approach uses the Euclidean distance from a boundary point as the scalar running variable; hence, multivariate designs are handled as uni-variate designs. However, the distance running variable is incompatible with the assumption for asymptotic validity. We handle multivariate designs as multivariate. In this study, we develop a novel asymptotic normality for multivariate local-polynomial estimators. Our estimator is asymptotically valid and can capture heterogeneous treatment effects over the boundary. We demonstrate the effectiveness of our estimator through numerical simulations. Our empirical illustration of a Colombian scholarship study reveals a richer heterogeneity (including its absence) of the treatment effect that is hidden in the original estimates.
When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some properties of the game are unknown a priori and must be inferred from observations. Existing maximum likelihood estimation (MLE) approaches to solve inverse games provide only point estimates of unknown parameters without quantifying uncertainty, and perform poorly when many parameter values explain the observed behavior. To address these limitations, we take a Bayesian perspective and construct posterior distributions of game parameters. To render inference tractable, we employ a variational autoencoder (VAE) with an embedded differentiable game solver. This structured VAE can be trained from an unlabeled dataset of observed interactions, naturally handles continuous, multi-modal distributions, and supports efficient sampling from the inferred posteriors without computing game solutions at runtime. Extensive evaluations in simulated driving scenarios demonstrate that the proposed approach successfully learns the prior and posterior objective distributions, provides more accurate objective estimates than MLE baselines, and facilitates safer and more efficient game-theoretic motion planning.
We study a Bayesian persuasion problem where a sender wants to persuade a receiver to take a binary action, such as purchasing a product. The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities. Motivated by customer surveys, user studies, and recent advances in generative AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior. After a fixed number of queries, the sender commits to a messaging policy and the receiver takes the action that maximizes her expected utility given the message she receives. We characterize the sender's optimal messaging policy given any distribution over receiver types. We then design a polynomial-time querying algorithm that optimizes the sender's expected utility in this Bayesian persuasion game. We also consider approximate oracles, more general query structures, and costly queries.
In strategic classification, agents modify their features, at a cost, to ideally obtain a positive classification from the learner's classifier. The typical response of the learner is to carefully modify their classifier to be robust to such strategic behavior. When reasoning about agent manipulations, most papers that study strategic classification rely on the following strong assumption: agents fully know the exact parameters of the deployed classifier by the learner. This often is an unrealistic assumption when using complex or proprietary machine learning techniques in real-world prediction tasks. We initiate the study of partial information release by the learner in strategic classification. We move away from the traditional assumption that agents have full knowledge of the classifier. Instead, we consider agents that have a common distributional prior on which classifier the learner is using. The learner in our model can reveal truthful, yet not necessarily complete, information about the deployed classifier to the agents. The learner's goal is to release just enough information about the classifier to maximize accuracy. We show how such partial information release can, counter-intuitively, benefit the learner's accuracy, despite increasing agents' abilities to manipulate. We show that while it is intractable to compute the best response of an agent in the general case, there exist oracle-efficient algorithms that can solve the best response of the agents when the learner's hypothesis class is the class of linear classifiers, or when the agents' cost function satisfies a natural notion of submodularity as we define. We then turn our attention to the learner's optimization problem and provide both positive and negative results on the algorithmic problem of how much information the learner should release about the classifier to maximize their expected accuracy.
Existing work in fairness audits assumes that agents operate independently. In this paper, we consider the case of multiple agents auditing the same platform for different tasks. Agents have two levers: their collaboration strategy, with or without coordination beforehand, and their sampling method. We theoretically study their interplay when agents operate independently or collaborate. We prove that, surprisingly, coordination can sometimes be detrimental to audit accuracy, whereas uncoordinated collaboration generally yields good results. Experimentation on real-world datasets confirms this observation, as the audit accuracy of uncoordinated collaboration matches that of collaborative optimal sampling.
Iterative voting is a natural model of repeated strategic decision-making in social choice when agents have the opportunity to update their votes prior to finalizing the group decision. Prior work has analyzed the efficacy of iterative plurality on the welfare of the chosen outcome at equilibrium, relative to the truthful vote profile, via an adaptation of the price of anarchy. However, prior analyses have only studied the worst-case and average-case performances when agents' preferences are distributed by the impartial culture. This work extends average-case analyses to a wider class of distributions and distinguishes when iterative plurality improves or degrades asymptotic welfare.
Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated.
We present a distributed quasi-Newton (DQN) method, which enables a group of agents to compute an optimal solution of a separable multi-agent optimization problem locally using an approximation of the curvature of the aggregate objective function. Each agent computes a descent direction from its local estimate of the aggregate Hessian, obtained from quasi-Newton approximation schemes using the gradient of its local objective function. Moreover, we introduce a distributed quasi-Newton method for equality-constrained optimization (EC-DQN), where each agent takes Karush-Kuhn-Tucker-like update steps to compute an optimal solution. In our algorithms, each agent communicates with its one-hop neighbors over a peer-to-peer communication network to compute a common solution. We prove convergence of our algorithms to a stationary point of the optimization problem. In addition, we demonstrate the competitive empirical convergence of our algorithm in both well-conditioned and ill-conditioned optimization problems, in terms of the computation time and communication cost incurred by each agent for convergence, compared to existing distributed first-order and second-order methods. Particularly, in ill-conditioned problems, our algorithms achieve a faster computation time for convergence, while requiring a lower communication cost, across a range of communication networks with different degrees of connectedness, by leveraging information on the curvature of the problem.
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).