18GAY国产小鲜肉可播放_日本一区二区三区免费在线观看_欧洲一区二区三区四区视频_国产精品国产精品国产三级普_久久精品国产99久久无码_性激烈的欧美三级视频免_久久久精品不卡一区二区

This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.

相關內容

納什均衡

關注 1

Analysis · Learning · 噪聲 · 控制器 · 聯邦學習 ·

2023 年 7 月 14 日

Improved Convergence Analysis and SNR Control Strategies for Federated Learning in the Presence of Noise

Antesh Upadhyay,Abolfazl Hashemi

We propose an improved convergence analysis technique that characterizes the distributed learning paradigm of federated learning (FL) with imperfect/noisy uplink and downlink communications. Such imperfect communication scenarios arise in the practical deployment of FL in emerging communication systems and protocols. The analysis developed in this paper demonstrates, for the first time, that there is an asymmetry in the detrimental effects of uplink and downlink communications in FL. In particular, the adverse effect of the downlink noise is more severe on the convergence of FL algorithms. Using this insight, we propose improved Signal-to-Noise (SNR) control strategies that, discarding the negligible higher-order terms, lead to a similar convergence rate for FL as in the case of a perfect, noise-free communication channel while incurring significantly less power resources compared to existing solutions. In particular, we establish that to maintain the $O(\frac{1}{\sqrt{K}})$ rate of convergence like in the case of noise-free FL, we need to scale down the uplink and downlink noise by $\Omega({\sqrt{k}})$ and $\Omega({k})$ respectively, where $k$ denotes the communication round, $k=1,\dots, K$. Our theoretical result is further characterized by two major benefits: firstly, it does not assume the somewhat unrealistic assumption of bounded client dissimilarity, and secondly, it only requires smooth non-convex loss functions, a function class better suited for modern machine learning and deep learning models. We also perform extensive empirical analysis to verify the validity of our theoretical findings.

情景 · 近似 · Analysis · Agent · 可辨認的 ·

2023 年 7 月 14 日

Equilibrium Analysis of Customer Attraction Games

Xiaotie Deng,Ningyuan Li,Weian Li,Qi Qi

We introduce a game model called "customer attraction game" to demonstrate the competition among online content providers. In this model, customers exhibit interest in various topics. Each content provider selects one topic and benefits from the attracted customers. We investigate both symmetric and asymmetric settings involving agents and customers. In the symmetric setting, the existence of pure Nash equilibrium (PNE) is guaranteed, but finding a PNE is PLS-complete. To address this, we propose a fully polynomial time approximation scheme to identify an approximate PNE. Moreover, the tight Price of Anarchy (PoA) is established. In the asymmetric setting, we show the nonexistence of PNE in certain instances and establish that determining its existence is NP-hard. Nevertheless, we prove the existence of an approximate PNE. Additionally, when agents select topics sequentially, we demonstrate that finding a subgame-perfect equilibrium is PSPACE-hard. Furthermore, we present the sequential PoA for the two-agent setting.

賭博機/老虎機 · 約束 · 優化器 · 在線 · CASE ·

2023 年 7 月 13 日

Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback

Yeongjong Kim,Dabeen Lee

from arxiv, We found a paper that has already obtained the results of the submission

This paper studies online convex optimization with stochastic constraints. We propose a variant of the drift-plus-penalty algorithm that guarantees $O(\sqrt{T})$ expected regret and zero constraint violation, after a fixed number of iterations, which improves the vanilla drift-plus-penalty method with $O(\sqrt{T})$ constraint violation. Our algorithm is oblivious to the length of the time horizon $T$, in contrast to the vanilla drift-plus-penalty method. This is based on our novel drift lemma that provides time-varying bounds on the virtual queue drift and, as a result, leads to time-varying bounds on the expected virtual queue length. Moreover, we extend our framework to stochastic-constrained online convex optimization under two-point bandit feedback. We show that by adapting our algorithmic framework to the bandit feedback setting, we may still achieve $O(\sqrt{T})$ expected regret and zero constraint violation, improving upon the previous work for the case of identical constraint functions. Numerical results demonstrate our theoretical results.

過擬合 · 情景 · 期望錯誤 · 可辨認的 · 交叉驗證 ·

2023 年 7 月 13 日

Short Boolean Formulas as Explanations in Practice

Reijo Jaakkola,Tomi Janhunen,Antti Kuusisto,Masood Feyzbakhsh Rankooh,Miikka Vilander

from arxiv, The original submitted version of a JELIA 2023 paper

We investigate explainability via short Boolean formulas in the data model based on unary relations. As an explanation of length k, we take a Boolean formula of length k that minimizes the error with respect to the target attribute to be explained. We first provide novel quantitative bounds for the expected error in this scenario. We then also demonstrate how the setting works in practice by studying three concrete data sets. In each case, we calculate explanation formulas of different lengths using an encoding in Answer Set Programming. The most accurate formulas we obtain achieve errors similar to other methods on the same data sets. However, due to overfitting, these formulas are not necessarily ideal explanations, so we use cross validation to identify a suitable length for explanations. By limiting to shorter formulas, we obtain explanations that avoid overfitting but are still reasonably accurate and also, importantly, human interpretable.

Prophet · 優化器 · 估計/估計量 · 線性的 · 閾值 ·

2023 年 7 月 12 日

Reward Selection with Noisy Observations

Kamyar Azizzadenesheli,Trung Dang,Aranyak Mehta,Alexandros Psomas,Qian Zhang

We study a fundamental problem in optimization under uncertainty. There are $n$ boxes; each box $i$ contains a hidden reward $x_i$. Rewards are drawn i.i.d. from an unknown distribution $\mathcal{D}$. For each box $i$, we see $y_i$, an unbiased estimate of its reward, which is drawn from a Normal distribution with known standard deviation $\sigma_i$ (and an unknown mean $x_i$). Our task is to select a single box, with the goal of maximizing our reward. This problem captures a wide range of applications, e.g. ad auctions, where the hidden reward is the click-through rate of an ad. Previous work in this model [BKMR12] proves that the naive policy, which selects the box with the largest estimate $y_i$, is suboptimal, and suggests a linear policy, which selects the box $i$ with the largest $y_i - c \cdot \sigma_i$, for some $c > 0$. However, no formal guarantees are given about the performance of either policy (e.g., whether their expected reward is within some factor of the optimal policy's reward). In this work, we prove that both the naive policy and the linear policy are arbitrarily bad compared to the optimal policy, even when $\mathcal{D}$ is well-behaved, e.g. has monotone hazard rate (MHR), and even under a "small tail" condition, which requires that not too many boxes have arbitrarily large noise. On the flip side, we propose a simple threshold policy that gives a constant approximation to the reward of a prophet (who knows the realized values $x_1, \dots, x_n$) under the same "small tail" condition. We prove that when this condition is not satisfied, even an optimal clairvoyant policy (that knows $\mathcal{D}$) cannot get a constant approximation to the prophet, even for MHR distributions, implying that our threshold policy is optimal against the prophet benchmark, up to constants.

隨機梯度下降 · 向量化 · Learning · Performer · 可理解性 ·

2023 年 7 月 11 日

Implicit regularisation in stochastic gradient descent: from single-objective to two-player games

Mihaela Rosca,Marc Peter Deisenroth

Recent years have seen many insights on deep learning optimisation being brought forward by finding implicit regularisation effects of commonly used gradient-based optimisers. Understanding implicit regularisation can not only shed light on optimisation dynamics, but it can also be used to improve performance and stability across problem domains, from supervised learning to two-player games such as Generative Adversarial Networks. An avenue for finding such implicit regularisation effects has been quantifying the discretisation errors of discrete optimisers via continuous-time flows constructed by backward error analysis (BEA). The current usage of BEA is not without limitations, since not all the vector fields of continuous-time flows obtained using BEA can be written as a gradient, hindering the construction of modified losses revealing implicit regularisers. In this work, we provide a novel approach to use BEA, and show how our approach can be used to construct continuous-time flows with vector fields that can be written as gradients. We then use this to find previously unknown implicit regularisation effects, such as those induced by multiple stochastic gradient descent steps while accounting for the exact data batches used in the updates, and in generally differentiable two-player games.

查準率/準確率 · 層 · 模型評估 · Networking · Neural Networks ·

2023 年 7 月 11 日

Mixed-Precision Quantization with Cross-Layer Dependencies

Zihao Deng,Xin Wang,Sayeh Sharify,Michael Orshansky

Quantization is commonly used to compress and accelerate deep neural networks. Quantization assigning the same bit-width to all layers leads to large accuracy degradation at low precision and is wasteful at high precision settings. Mixed-precision quantization (MPQ) assigns varied bit-widths to layers to optimize the accuracy-efficiency trade-off. Existing methods simplify the MPQ problem by assuming that quantization errors at different layers act independently. We show that this assumption does not reflect the true behavior of quantized deep neural networks. We propose the first MPQ algorithm that captures the cross-layer dependency of quantization error. Our algorithm (CLADO) enables a fast approximation of pairwise cross-layer error terms by solving linear equations that require only forward evaluations of the network on a small amount of data. Decisions on layerwise bit-width assignments are then determined by optimizing a new MPQ formulation dependent on these cross-layer quantization errors via the Integer Quadratic Program (IQP), which can be solved within seconds. We conduct experiments on multiple networks on the Imagenet dataset and demonstrate an improvement, in top-1 classification accuracy, of up to 27% over uniform precision quantization, and up to 15% over existing MPQ methods.

Learning · Neural Networks · Networking · 可約的 · Networks ·

2022 年 9 月 1 日

Learning with Differentiable Algorithms

Felix Petersen

from arxiv, PhD thesis (summa cum laude), University of Konstanz, 162 pages

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

Obvious · 知識 (knowledge) · MoDELS · 博弈論 · BASIC ·

2022 年 7 月 16 日

A Survey of Decision Making in Adversarial Games

Xiuxian Li,Min Meng,Yiguang Hong,Jie Chen

Game theory has by now found numerous applications in various fields, including economics, industry, jurisprudence, and artificial intelligence, where each player only cares about its own interest in a noncooperative or cooperative manner, but without obvious malice to other players. However, in many practical applications, such as poker, chess, evader pursuing, drug interdiction, coast guard, cyber-security, and national defense, players often have apparently adversarial stances, that is, selfish actions of each player inevitably or intentionally inflict loss or wreak havoc on other players. Along this line, this paper provides a systematic survey on three main game models widely employed in adversarial games, i.e., zero-sum normal-form and extensive-form games, Stackelberg (security) games, zero-sum differential games, from an array of perspectives, including basic knowledge of game models, (approximate) equilibrium concepts, problem classifications, research frontiers, (approximate) optimal strategy seeking techniques, prevailing algorithms, and practical applications. Finally, promising future research directions are also discussed for relevant adversarial games.

Learning · 不完美信息 · Agent · 強化學習 · Self-Play ·

2022 年 6 月 30 日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Julien Perolat,Bart de Vylder,Daniel Hennes,Eugene Tarassov,Florian Strub,Vincent de Boer,Paul Muller,Jerome T. Connor,Neil Burch,Thomas Anthony,Stephen McAleer,Romuald Elie,Sarah H. Cen,Zhe Wang,Audrunas Gruslys,Aleksandra Malysheva,Mina Khan,Sherjil Ozair,Finbarr Timbers,Toby Pohlen,Tom Eccles,Mark Rowland,Marc Lanctot,Jean-Baptiste Lespiau,Bilal Piot,Shayegan Omidshafiei,Edward Lockhart,Laurent Sifre,Nathalie Beauguerlange,Remi Munos,David Silver,Satinder Singh,Demis Hassabis,Karl Tuyls

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.