91婷婷国产精选国产色,九九九精品视频网站,欧美日本理伦黄色网站,日韩精品人妻在线视频

Recent advances at the intersection of dense large graph limits and mean field games have begun to enable the scalable analysis of a broad class of dynamical sequential games with large numbers of agents. So far, results have been largely limited to graphon mean field systems with continuous-time diffusive or jump dynamics, typically without control and with little focus on computational methods. We propose a novel discrete-time formulation for graphon mean field games as the limit of non-linear dense graph Markov games with weak interaction. On the theoretical side, we give extensive and rigorous existence and approximation properties of the graphon mean field solution in sufficiently large systems. On the practical side, we provide general learning schemes for graphon mean field equilibria by either introducing agent equivalence classes or reformulating the graphon mean field system as a classical mean field system. By repeatedly finding a regularized optimal control solution and its generated mean field, we successfully obtain plausible approximate Nash equilibria in otherwise infeasible large dense graph games with many agents. Empirically, we are able to demonstrate on a number of examples that the finite-agent behavior comes increasingly close to the mean field behavior for our computed equilibria as the graph or system size grows, verifying our theory. More generally, we successfully apply policy gradient reinforcement learning in conjunction with sequential Monte Carlo methods.

相關內容

均值

關注 0

納什均衡 · 代價函數 · 示例 · 可約的 · 泛函 ·

2022 年 2 月 21 日

Generalized Nash Equilibrium Problems with Mixed-Integer Variables

Tobias Harks,Julian Schwarz

from arxiv, 36 pages, 9 figures, 2 tables

We consider generalized Nash equilibrium problems (GNEPs) with non-convex strategy spaces and non-convex cost functions. This general class of games includes the important case of games with mixed-integer variables for which only a few results are known in the literature. We present a new approach to characterize equilibria via a convexification technique using the Nikaido-Isoda function. To any given instance of the GNEP, we construct a set of convexified instances and show that a feasible strategy profile is an equilibrium for the original instance if and only if it is an equilibrium for any convexified instance and the convexified cost functions coincide with the initial ones. We further develop this approach along three dimensions. We first show that for quasi-linear models, where a convexified instance exists in which for fixed strategies of the opponent players, the cost function of every player is linear and the respective strategy space is polyhedral, the convexification reduces the GNEP to a standard (non-linear) optimization problem. Secondly, we derive two complete characterizations of those GNEPs for which the convexification leads to a jointly constrained or a jointly convex GNEP, respectively. These characterizations require new concepts related to the interplay of the convex hull operator applied to restricted subsets of feasible strategies and may be interesting on their own. Finally, we demonstrate the applicability of our results by presenting a numerical study regarding the computation of equilibria for a class of integral network flow GNEPs.

估計/估計量 · MoDELS · 近似 · 賭博機/老虎機 · INFORMS ·

2022 年 2 月 21 日

A bandit-learning approach to multifidelity approximation

Yiming Xu,Vahid Keshavarzzadeh,Robert M. Kirby,Akil Narayan

from arxiv, 41 pages, 10 figures, corrected some typos

Multifidelity approximation is an important technique in scientific computation and simulation. In this paper, we introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates of the parameters of interest. Under a linear model assumption, we formulate a multifidelity approximation as a modified stochastic bandit, and analyze the loss for a class of policies that uniformly explore each model before exploiting. Utilizing the estimated conditional mean-squared error, we propose a consistent algorithm, adaptive Explore-Then-Commit (AETC), and establish a corresponding trajectory-wise optimality result. These results are then extended to the case of vector-valued responses, where we demonstrate that the algorithm is efficient without the need to worry about estimating high-dimensional parameters. The main advantage of our approach is that we require neither hierarchical model structure nor \textit{a priori} knowledge of statistical information (e.g., correlations) about or between models. Instead, the AETC algorithm requires only knowledge of which model is a trusted high-fidelity model, along with (relative) computational cost estimates of querying each model. Numerical experiments are provided at the end to support our theoretical findings.

賭博機/老虎機 · 上置信界限 · 近似推斷 · 推斷 · 近似 ·

2022 年 2 月 20 日

Generalized Bayesian Upper Confidence Bound with Approximate Inference for Bandit Problems

Ziyi Huang,Henry Lam,Amirhossein Meisami,Haofeng Zhang

Bayesian bandit algorithms with approximate inference have been widely used in practice with superior performance. Yet, few studies regarding the fundamental understanding of their performances are available. In this paper, we propose a Bayesian bandit algorithm, which we call Generalized Bayesian Upper Confidence Bound (GBUCB), for bandit problems in the presence of approximate inference. Our theoretical analysis demonstrates that in Bernoulli multi-armed bandit, GBUCB can achieve $O(\sqrt{T}(\log T)^c)$ frequentist regret if the inference error measured by symmetrized Kullback-Leibler divergence is controllable. This analysis relies on a novel sensitivity analysis for quantile shifts with respect to inference errors. To our best knowledge, our work provides the first theoretical regret bound that is better than $o(T)$ in the setting of approximate inference. Our experimental evaluations on multiple approximate inference settings corroborate our theory, showing that our GBUCB is consistently superior to BUCB and Thompson sampling.

優化器 · SimPLe · 切比雪夫多項式 · 正定 · Performer ·

2022 年 2 月 17 日

Optimal polynomial smoothers for multigrid V-cycles

James Lottes

The idea of using polynomial methods to improve simple smoother iterations within a multigrid method for a symmetric positive definite (SPD) system is revisited. When the single-step smoother itself corresponds to an SPD operator, there is in particular a very simple iteration, a close cousin of the Chebyshev semi-iterative method, based on the Chebyshev polynomials of the fourth instead of first kind, that optimizes a two-level bound going back to Hackbusch. A full V-cycle bound for general polynomial smoothers is derived using the V-cycle theory of McCormick. The fourth-kind Chebyshev iteration is quasi-optimal for the V-cycle bound. The optimal polynomials for the V-cycle bound can be found numerically, achieving an about 18% lower error contraction factor bound than the fourth-kind Chebyshev iteration, asymptotically as the number of smoothing steps goes to infinity. Implementation of the optimized iteration is discussed, and the performance of the polynomial smoothers are illustrated with a simple numerical example.

SPE · 閾值 · CASE · 確切的 · 博弈論 ·

2022 年 2 月 17 日

The Complexity of SPEs in Mean-payoff Games

Léonard Brice,Jean-Fran?ois Raskin,Marie van den Bogaard

We establish that the subgame perfect equilibrium (SPE) threshold problem for mean-payoff games is NP-complete. While the SPE threshold problem was recently shown to be decidable (in doubly exponential time) and NP-hard, its exact worst case complexity was left open.

穩健性 · Performer · 學成 · 可約的 · 強化學習 ·

2022 年 2 月 17 日

Robust Reinforcement Learning via Genetic Curriculum

Yeeho Song,Jeff Schneider

from arxiv, Accepted to 2022 IEEE International Conference on Robotics and Automation (ICRA)

Achieving robust performance is crucial when applying deep reinforcement learning (RL) in safety critical systems. Some of the state of the art approaches try to address the problem with adversarial agents, but these agents often require expert supervision to fine tune and prevent the adversary from becoming too challenging to the trainee agent. While other approaches involve automatically adjusting environment setups during training, they have been limited to simple environments where low-dimensional encodings can be used. Inspired by these approaches, we propose genetic curriculum, an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum to help the agent learn to solve the scenarios and acquire more robust behaviors. As a non-parametric optimizer, our approach uses a raw, non-fixed encoding of scenarios, reducing the need for expert supervision and allowing our algorithm to adapt to the changing performance of the agent. Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail without sacrificing cumulative reward. We include an ablation study and share insights on why our algorithm outperforms prior approaches.

近似 · 學成 · Performer · Better · 強化學習 ·

2021 年 2 月 23 日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Luisa Zintgraf,Leo Feng,Cong Lu,Maximilian Igl,Kristian Hartikainen,Katja Hofmann,Shimon Whiteson

To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during meta-training is exacerbated. To address this, we propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space (where hyper-states represent the environment state and the agent's task belief). We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.

近似 · INFORMS · 納什均衡 · 強化學習 · 學成 ·

2020 年 6 月 15 日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Stephen McAleer,John Lanier,Roy Fox,Pierre Baldi

from arxiv, SM and JL contributed equally

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.

學成 · 均值 · 強化學習 · entity · INTERACT ·

2018 年 6 月 12 日

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang,Rui Luo,Minne Li,Ming Zhou,Weinan Zhang,Jun Wang

from arxiv, ICML 2018 (Full paper + Long talk)

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

GANs · 優化器 · GAN · MoDELS · 學成 ·

2018 年 1 月 30 日

Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields

Thomas Unterthiner,Bernhard Nessler,Calvin Seward,Günter Klambauer,Martin Heusel,Hubert Ramsauer,Sepp Hochreiter

from arxiv, Published as a conference paper at ICLR (International Conference on Learning Representations) 2018. Implementation available at //github.com/bioinf-jku/coulomb_gan

Generative adversarial networks (GANs) evolved into one of the most successful unsupervised techniques for generating realistic images. Even though it has recently been shown that GAN training converges, GAN models often end up in local Nash equilibria that are associated with mode collapse or otherwise fail to model the target distribution. We introduce Coulomb GANs, which pose the GAN learning problem as a potential field of charged particles, where generated samples are attracted to training set samples but repel each other. The discriminator learns a potential field while the generator decreases the energy by moving its samples along the vector (force) field determined by the gradient of the potential field. Through decreasing the energy, the GAN model learns to generate samples according to the whole target distribution and does not only cover some of its modes. We prove that Coulomb GANs possess only one Nash equilibrium which is optimal in the sense that the model distribution equals the target distribution. We show the efficacy of Coulomb GANs on a variety of image datasets. On LSUN and celebA, Coulomb GANs set a new state of the art and produce a previously unseen variety of different samples.