We study random-turn resource-allocation games. In the Trail of Lost Pennies, a counter moves on $\mathbb{Z}$. At each turn, Maxine stakes $a \in [0,\infty)$ and Mina $b \in [0,\infty)$. The counter $X$ then moves adjacently, to the right with probability $\tfrac{a}{a+b}$. If $X_i \to -\infty$ in this infinte-turn game, Mina receives one unit, and Maxine zero; if $X_i \to \infty$, then these receipts are zero and $x$. Thus the net receipt to a given player is $-A+B$, where $A$ is the sum of her stakes, and $B$ is her terminal receipt. The game was inspired by unbiased tug-of-war in~[PSSW] from 2009 but in fact closely resembles the original version of tug-of-war, introduced [HarrisVickers87] in the economics literature in 1987. We show that the game has surprising features. For a natural class of strategies, Nash equilibria exist precisely when $x$ lies in $[\lambda,\lambda^{-1}]$, for a certain $\lambda \in (0,1)$. We indicate that $\lambda$ is remarkably close to one, proving that $\lambda \leq 0.999904$ and presenting clear numerical evidence that $\lambda \geq 1 - 10^{-4}$. For each $x \in [\lambda,\lambda^{-1}]$, we find countably many Nash equilibria. Each is roughly characterized by an integral {\em battlefield} index: when the counter is nearby, both players stake intensely, with rapid but asymmetric decay in stakes as it moves away. Our results advance premises [HarrisVickers87,Konrad12] for fund management and the incentive-outcome relation that plausibly hold for many player-funded stake-governed games. Alongside a companion treatment [HP22] of games with allocated budgets, we thus offer a detailed mathematical treatment of an illustrative class of tug-of-war games. We also review the separate developments of tug-of-war in economics and mathematics in the hope that mathematicians direct further attention to tug-of-war in its original resource-allocation guise.
In the prophet inequality problem, a gambler faces a sequence of items arriving online with values drawn independently from known distributions. On seeing an item, the gambler must choose whether to accept its value as her reward and quit the game, or reject it and continue. The gambler's aim is to maximize her expected reward relative to the expected maximum of the values of all items. Since the seventies, a tight bound of 1/2 has been known for this competitive ratio in the setting where the items arrive in an adversarial order (Krengel and Sucheston, 1977, 1978). However, the optimum ratio still remains unknown in the order selection setting, where the gambler selects the arrival order, as well as in prophet secretary, where the items arrive in a random order. Moreover, it is not even known whether a separation exists between the two settings. In this paper, we show that the power of order selection allows the gambler to guarantee a strictly better competitive ratio than if the items arrive randomly. For the order selection setting, we identify an instance for which Peng and Tang's (FOCS'22) state-of-the-art algorithm performs no better than their claimed competitive ratio of (approximately) 0.7251, thus illustrating the need for an improved approach. We therefore extend their design and provide a more general algorithm design framework, using which we show that their ratio can be beaten, by designing a 0.7258-competitive algorithm. For the random order setting, we improve upon Correa, Saona and Ziliotto's (SODA'19) 0.732-hardness result to show a hardness of 0.7254 for general algorithms - even in the setting where the gambler knows the arrival order beforehand, thus establishing a separation between the order selection and random order settings.
We present FIT: a transformer-based architecture with efficient self-attention and adaptive computation. Unlike original transformers, which operate on a single sequence of data tokens, we divide the data tokens into groups, with each group being a shorter sequence of tokens. We employ two types of transformer layers: local layers operate on data tokens within each group, while global layers operate on a smaller set of introduced latent tokens. These layers, comprising the same set of self-attention and feed-forward layers as standard transformers, are interleaved, and cross-attention is used to facilitate information exchange between data and latent tokens within the same group. The attention complexity is $O(n^2)$ locally within each group of size $n$, but can reach $O(L^{{4}/{3}})$ globally for sequence length of $L$. The efficiency can be further enhanced by relying more on global layers that perform adaptive computation using a smaller set of latent tokens. FIT is a versatile architecture and can function as an encoder, diffusion decoder, or autoregressive decoder. We provide initial evidence demonstrating its effectiveness in high-resolution image understanding and generation tasks. Notably, FIT exhibits potential in performing end-to-end training on gigabit-scale data, such as 6400$\times$6400 images, even without specific optimizations or model parallelism.
We consider a facility location game in which $n$ agents reside at known locations on a path, and $k$ heterogeneous facilities are to be constructed on the path. Each agent is adversely affected by some subset of the facilities, and is unaffected by the others. We design two classes of mechanisms for choosing the facility locations given the reported agent preferences: utilitarian mechanisms that strive to maximize social welfare (i.e., to be efficient), and egalitarian mechanisms that strive to maximize the minimum welfare. For the utilitarian objective, we present a weakly group-strategyproof efficient mechanism for up to three facilities, we give strongly group-strategyproof mechanisms that achieve approximation ratios of $5/3$ and $2$ for $k=1$ and $k > 1$, respectively, and we prove that no strongly group-strategyproof mechanism achieves an approximation ratio less than $5/3$ for the case of a single facility. For the egalitarian objective, we present a strategyproof egalitarian mechanism for arbitrary $k$, and we prove that no weakly group-strategyproof mechanism achieves a $o(\sqrt{n})$ approximation ratio for two facilities. We extend our egalitarian results to the case where the agents are located on a cycle, and we extend our first egalitarian result to the case where the agents are located in the unit square.
Artificial Intelligence (AI) is one of the most transformative technologies of the 21st century. The extent and scope of future AI capabilities remain a key uncertainty, with widespread disagreement on timelines and potential impacts. As nations and technology companies race toward greater complexity and autonomy in AI systems, there are concerns over the extent of integration and oversight of opaque AI decision processes. This is especially true in the subfield of machine learning (ML), where systems learn to optimize objectives without human assistance. Objectives can be imperfectly specified or executed in an unexpected or potentially harmful way. This becomes more concerning as systems increase in power and autonomy, where an abrupt capability jump could result in unexpected shifts in power dynamics or even catastrophic failures. This study presents a hierarchical complex systems framework to model AI risk and provide a template for alternative futures analysis. Survey data were collected from domain experts in the public and private sectors to classify AI impact and likelihood. The results show increased uncertainty over the powerful AI agent scenario, confidence in multiagent environments, and increased concern over AI alignment failures and influence-seeking behavior.
Suppose that we have $n$ agents and $n$ items which lie in a shared metric space. We would like to match the agents to items such that the total distance from agents to their matched items is as small as possible. However, instead of having direct access to distances in the metric, we only have each agent's ranking of the items in order of distance. Given this limited information, what is the minimum possible worst-case approximation ratio (known as the distortion) that a matching mechanism can guarantee? Previous work by Caragiannis et al. proved that the (deterministic) Serial Dictatorship mechanism has distortion at most $2^n - 1$. We improve this by providing a simple deterministic mechanism that has distortion $O(n^2)$. We also provide the first nontrivial lower bound on this problem, showing that any matching mechanism (deterministic or randomized) must have worst-case distortion $\Omega(\log n)$. In addition to these new bounds, we show that a large class of truthful mechanisms derived from Deferred Acceptance all have worst-case distortion at least $2^n - 1$, and we find an intriguing connection between thin matchings (analogous to the well-known thin trees conjecture) and the distortion gap between deterministic and randomized mechanisms.
We study the hidden-action principal-agent problem in an online setting. In each round, the principal posts a contract that specifies the payment to the agent based on each outcome. The agent then makes a strategic choice of action that maximizes her own utility, but the action is not directly observable by the principal. The principal observes the outcome and receives utility from the agent's choice of action. Based on past observations, the principal dynamically adjusts the contracts with the goal of maximizing her utility. We introduce an online learning algorithm and provide an upper bound on its Stackelberg regret. We show that when the contract space is $[0,1]^m$, the Stackelberg regret is upper bounded by $\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$, and lower bounded by $\Omega(T^{1-1/(m+2)})$, where $\widetilde O$ omits logarithmic factors. This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design. Moreover, when contracts are restricted to some subset $\mathcal{F} \subset [0,1]^m$, we define an intrinsic dimension of $\mathcal{F}$ that depends on the covering number of the spherical code in the space and bound the regret in terms of this intrinsic dimension. When $\mathcal{F}$ is the family of linear contracts, we show that the Stackelberg regret grows exactly as $\Theta(T^{2/3})$. The contract design problem is challenging because the utility function is discontinuous. Bounding the discretization error in this setting has been an open problem. In this paper, we identify a limited set of directions in which the utility function is continuous, allowing us to design a new discretization method and bound its error. This approach enables the first upper bound with no restrictions on the contract and action space.
Recent work has considered whether large language models (LLMs) can function as planners: given a task, generate a plan. We investigate whether LLMs can serve as generalized planners: given a domain and training tasks, generate a program that efficiently produces plans for other tasks in the domain. In particular, we consider PDDL domains and use GPT-4 to synthesize Python programs. We also consider (1) Chain-of-Thought (CoT) summarization, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program; and (2) automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. We also conclude that automated debugging is very important, that CoT summarization has non-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just two training tasks are often sufficient for strong generalization.
In this paper, we focus on analyzing the supercloseness property of a two-dimensional singularly perturbed convection-diffusion problem with exponential boundary layers. The local discontinuous Galerkin (LDG) method with piecewise tensor-product polynomials of degree k is applied to Bakhvalov-type mesh. By developing special two-dimensional local Gauss-Radau projections and establishing a novel interpolation, supercloseness of an optimal order k+1 can be achieved on Bakhvalov-type mesh. It is crucial to highlight that this supercloseness result is independent of the singular perturbation parameter.
In this paper, we propose the Ordered Median Tree Location Problem (OMT). The OMT is a single-allocation facility location problem where p facilities must be placed on a network connected by a non-directed tree. The objective is to minimize the sum of the ordered weighted averaged allocation costs plus the sum of the costs of connecting the facilities in the tree. We present different MILP formulations for the OMT based on properties of the minimum spanning tree problem and the ordered median optimization. Given that ordered median hub location problems are rather difficult to solve we have improved the OMT solution performance by introducing covering variables in a valid reformulation plus developing two pre-processing phases to reduce the size of this formulations. In addition, we propose a Benders decomposition algorithm to approach the OMT. We establish an empirical comparison between these new formulations and we also provide enhancements that together with a proper formulation allow to solve medium size instances on general random graphs.
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.