亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $\delta>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/\delta$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.

相關內容

FAST:Conference on File and Storage Technologies。 Explanation:文件和存儲技術會議。 Publisher:USENIX。 SIT:

Announcement games, where information is disseminated by announcers and challenged by validators, are prevalent in real-world scenarios. Validators take effort to verify the validity of the announcements, gaining rewards for successfully challenging invalid ones, while receiving nothing for valid ones. Optimistic Rollup, a Layer 2 blockchain scaling solution, exemplifies such games, offering significant improvements in transaction throughput and cost efficiency. We present a game-theoretic model of announcement games to analyze the potential behaviors of announcers and validators. We identify all Nash equilibria and study the corresponding system losses for different Nash equilibria. Additionally, we analyze the impact of various system parameters on system loss under the Nash equilibrium. Finally, we provide suggestions for mechanism optimization to reduce system losses.

Vision-Language Models (VLMs) demonstrate remarkable zero-shot generalization to unseen tasks, but fall short of the performance of supervised methods in generalizing to downstream tasks with limited data. Prompt learning is emerging as a parameter-efficient method for adapting VLMs, but state-of-the-art approaches require annotated samples. In this paper we propose a novel approach to prompt learning based on unsupervised knowledge distillation from more powerful models. Our approach, which we call Knowledge Distillation Prompt Learning (KDPL), can be integrated into existing prompt learning techniques and eliminates the need for labeled examples during adaptation. Our experiments on more than ten standard benchmark datasets demonstrate that KDPL is very effective at improving generalization of learned prompts for zero-shot domain generalization, zero-shot cross-dataset generalization, and zero-shot base-to-novel class generalization problems. KDPL requires no ground-truth labels for adaptation, and moreover we show that even in the absence of any knowledge of training class names it can be used to effectively transfer knowledge. The code is publicly available at //github.com/miccunifi/KDPL.

World-building, the process of developing both the narrative and physical world of a game, plays a vital role in the game's experience. Critically-acclaimed independent and AAA video games are praised for strong world-building, with game maps that masterfully intertwine with and elevate the narrative, captivating players and leaving a lasting impression. However, designing game maps that support a desired narrative is challenging, as it requires satisfying complex constraints from various considerations. Most existing map generation methods focus on considerations about gameplay mechanics or map topography, while the need to support the story is typically neglected. As a result, extensive manual adjustment is still required to design a game world that facilitates particular stories. In this work, we approach this problem by introducing an extra layer of plot facility layout design that is independent of the underlying map generation method in a world-building pipeline. Concretely, we define (plot) facility layout tasks as the tasks of assigning concrete locations on a game map to abstract locations mentioned in a given story (plot facilities), following spatial constraints derived from the story. We present two methods for solving these tasks automatically: an evolutionary computation based approach through Covariance Matrix Adaptation Evolution Strategy (CMA-ES), and a Reinforcement Learning (RL) based approach. We develop a method of generating datasets of facility layout tasks, create a gym-like environment for experimenting with and evaluating different methods, and further analyze the two methods with comprehensive experiments, aiming to provide insights for solving facility layout tasks. We will release the code and a dataset containing 10, 000 tasks of different scales.

We study best-response type learning dynamics for two player zero-sum matrix games. We consider two settings that are distinguished by the type of information that each player has about the game and their opponent's strategy. The first setting is the full information case, in which each player knows their own and the opponent's payoff matrices and observes the opponent's mixed strategy. The second setting is the minimal information case, where players do not observe the opponent's strategy and are not aware of either of the payoff matrices (instead they only observe their realized payoffs). For this setting, also known as the radically uncoupled case in the learning in games literature, we study a two-timescale learning dynamics that combine smoothed best-response type updates for strategy estimates with a TD-learning update to estimate a local payoff function. For these dynamics, without additional exploration, we provide polynomial-time finite-sample guarantees for convergence to an $\epsilon$-Nash equilibrium.

Repeated measurements of player experience are crucial in games user research, assessing how different designs evolve over time. However, this necessitates lightweight measurement instruments that are fit for the purpose. In this study, we conduct an examination of the test-retest reliability of the \emph{miniPXI} -- a short variant of the \emph{Player Experience Inventory} (\emph{PXI}), an established measure for measuring player experience. We analyzed test-retest reliability by leveraging four games involving 100 participants, comparing it with four established multi-item measures and single-item indicators such as the Net Promoter Score (\emph{NPS}) and overall enjoyment. The findings show mixed outcomes; the \emph{miniPXI} demonstrated varying levels of test-retest reliability. Some constructs showed good to moderate reliability, while others were less consistent. On the other hand, multi-item measures exhibited moderate to good test-retest reliability, demonstrating their effectiveness in measuring player experiences over time. Additionally, the employed single-item indicators (\emph{NPS} and overall enjoyment) demonstrated good reliability. The results of our study highlight the complexity of player experience evaluations over time, utilizing single and multiple items per construct measures. We conclude that single-item measures may not be appropriate for long-term investigations of more complex PX dimensions and provide practical considerations for the applicability of such measures in repeated measurements.

We consider two-player games with imperfect information and the synthesis of a randomized strategy for one player that ensures the objective is satisfied almost-surely (i.e., with probability 1), regardless of the strategy of the other player. Imperfect information is modeled by an indistinguishability relation describing the pairs of histories that the first player cannot distinguish, a generalization of the traditional model with partial observations. The game is regular if it admits a regular function whose kernel commutes with the indistinguishability relation. The synthesis of pure strategies that ensure all possible outcomes satisfy the objective is possible in regular games, by a generic reduction that holds for all objectives. While the solution for pure strategies extends to randomized strategies in the traditional model with partial observations (which is always regular), we show that a similar reduction does not exist in the more general model. Despite that, we show that in regular games with Buechi objectives the synthesis problem is decidable for randomized strategies that ensure the outcome satisfies the objective almost-surely.

The end of Moore's Law and Dennard Scaling has combined with advances in agile hardware design to foster a golden age of domain-specific acceleration. However, this new frontier of computing opportunities is not without pitfalls. As computer architects approach unfamiliar domains, we have seen common themes emerge in the challenges that can hinder progress in the development of useful acceleration. In this work, we present the Magnificent Seven Challenges in domain-specific accelerator design that can guide adventurous architects to contribute meaningfully to novel application domains. Although these challenges appear across domains ranging from ML to genomics, we examine them through the lens of autonomous systems as a motivating example in this work. To that end, we identify opportunities for the path forward in a successful domain-specific accelerator design from these challenges.

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.

北京阿比特科技有限公司