久久香蕉国产线看观看亚洲卡_动漫AV观看网站不卡无码_亚洲综合欧美精品一区二区_免费精品黑人一区二区三区_日韩免费无码专区精品观看_免费的很黄很污的视频_美女被操免费观看

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm \textit{Mean Field Proximal Policy Optimization (MF-PPO)}, and we empirically show the effectiveness of our method in the OpenSpiel framework.

相關內容

博(bo)弈

關注 14

MoDELS · 評論員 · Attention · Performer · 蒙特卡羅 ·

2023 年 5 月 24 日

Attention to Mean-Fields for Particle Cloud Generation

Benno K?ch,Isabell Melzer-Pellmann

The generation of collider data using machine learning has emerged as a prominent research topic in particle physics due to the increasing computational challenges associated with traditional Monte Carlo simulation methods, particularly for future colliders with higher luminosity. Although generating particle clouds is analogous to generating point clouds, accurately modelling the complex correlations between the particles presents a considerable challenge. Additionally, variable particle cloud sizes further exacerbate these difficulties, necessitating more sophisticated models. In this work, we propose a novel model that utilizes an attention-based aggregation mechanism to address these challenges. The model is trained in an adversarial training paradigm, ensuring that both the generator and critic exhibit permutation equivariance/invariance with respect to their input. A novel feature matching loss in the critic is introduced to stabilize the training. The proposed model performs competitively to the state-of-art whilst having significantly fewer parameters.

近似 · Learning · 動力系統 · INTERACT · 表示 ·

2023 年 5 月 24 日

Policy Learning based on Deep Koopman Representation

Wenjian Hao,Paulo C. Heredia,Bowen Huang,Zehui Lu,Zihao Liang,Shaoshuai Mou

This paper proposes a policy learning algorithm based on the Koopman operator theory and policy gradient approach, which seeks to approximate an unknown dynamical system and search for optimal policy simultaneously, using the observations gathered through interaction with the environment. The proposed algorithm has two innovations: first, it introduces the so-called deep Koopman representation into the policy gradient to achieve a linear approximation of the unknown dynamical system, all with the purpose of improving data efficiency; second, the accumulated errors for long-term tasks induced by approximating system dynamics are avoided by applying Bellman's principle of optimality. Furthermore, a theoretical analysis is provided to prove the asymptotic convergence of the proposed algorithm and characterize the corresponding sampling complexity. These conclusions are also supported by simulations on several challenging benchmark environments.

流 · Facebook AI Research · 極大 · 約束 · 泛化理論 ·

2023 年 5 月 24 日

Fairness in Streaming Submodular Maximization over a Matroid Constraint

Marwa El Halabi,Federico Fusco,Ashkan Norouzi-Fard,Jakab Tardos,Jakub Tarnawski

from arxiv, Accepted to ICML 23

Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.

潛變量/隱變量 · 自由能 · 極大似然 · 最大似然估計 · 向量空間 ·

2023 年 5 月 24 日

CoinEM: Tuning-Free Particle-Based Variational Inference for Latent Variable Models

Louis Sharrock,Daniel Dodd,Christopher Nemeth

We introduce two new particle-based algorithms for learning latent variable models via marginal maximum likelihood estimation, including one which is entirely tuning-free. Our methods are based on the perspective of marginal maximum likelihood estimation as an optimization problem: namely, as the minimization of a free energy functional. One way to solve this problem is to consider the discretization of a gradient flow associated with the free energy. We study one such approach, which resembles an extension of the popular Stein variational gradient descent algorithm. In particular, we establish a descent lemma for this algorithm, which guarantees that the free energy decreases at each iteration. This method, and any other obtained as the discretization of the gradient flow, will necessarily depend on a learning rate which must be carefully tuned by the practitioner in order to ensure convergence at a suitable rate. With this in mind, we also propose another algorithm for optimizing the free energy which is entirely learning rate free, based on coin betting techniques from convex optimization. We validate the performance of our algorithms across a broad range of numerical experiments, including several high-dimensional settings. Our results are competitive with existing particle-based methods, without the need for any hyperparameter tuning.

優化器 · 可行 · 約束 · Learning · 啟發式算法 ·

2023 年 5 月 23 日

Constrained Proximal Policy Optimization

Chengbin Xuan,Feng Zhang,Faliang Yin,Hak-Keung Lam

The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction of constraint satisfaction, the current CRL methods necessitate the utilization of second-order optimization or primal-dual frameworks with additional Lagrangian multipliers, resulting in increased complexity and inefficiency during implementation. To address these issues, we propose a novel first-order feasible method named Constrained Proximal Policy Optimization (CPPO). By treating the CRL problem as a probabilistic inference problem, our approach integrates the Expectation-Maximization framework to solve it through two steps: 1) calculating the optimal policy distribution within the feasible region (E-step), and 2) conducting a first-order update to adjust the current policy towards the optimal policy obtained in the E-step (M-step). We establish the relationship between the probability ratios and KL divergence to convert the E-step into a convex optimization problem. Furthermore, we develop an iterative heuristic algorithm from a geometric perspective to solve this problem. Additionally, we introduce a conservative update mechanism to overcome the constraint violation issue that occurs in the existing feasible region method. Empirical evaluations conducted in complex and uncertain environments validate the effectiveness of our proposed method, as it performs at least as well as other baselines.

層 · TIP · 模型評估 · Integration · CASE ·

2023 年 5 月 22 日

On the layer crossing problem for a semi-infinite hydraulic fracture

A. V. Valov,E. V. Dontsov

This paper analyses the problem of a semi-infinite fluid-driven fracture propagating through multiple stress layers in a permeable elastic medium. Such a problem represents the tip region of a planar hydraulic fracture. When the hydraulic fracture crosses a stress layer, the use of a standard tip asymptotic solution may lead to a considerable reduction of accuracy, even for the simplest case of a height-contained fracture. In this study, we propose three approaches to incorporate the effect of stress layers into the tip asymptote: non-singular integral formulation, toughness-corrected asymptote, and an ordinary differential equation approximation of the non-singular integral formulation mentioned above. As illustrated in the paper, these approaches for stress-corrected asymptotes differ in computational complexity, the complexity of implementation, and the accuracy of the approximation. In addition, the size of the validity region of the stress-corrected asymptote is evaluated, and it is shown to be greatly reduced relative to the case without layers. In order to address the issue, the stress relaxation factor is introduced. This, in turn, allows for enhancing the accuracy of the layer-crossing computation on a relatively coarse mesh to utilize the stress-corrected asymptote in hydraulic fracturing simulators for the purpose of front tracking.

Learning · Performer · 聯邦學習 · 隨機梯度下降 · Integration ·

2023 年 5 月 22 日

Explicit Personalization and Local Training: Double Communication Acceleration in Federated Learning

Kai Yi,Laurent Condat,Peter Richtárik

Federated Learning is an evolving machine learning paradigm, in which multiple clients perform computations based on their individual private data, interspersed by communication with a remote server. A common strategy to curtail communication costs is Local Training, which consists in performing multiple local stochastic gradient descent steps between successive communication rounds. However, the conventional approach to local training overlooks the practical necessity for client-specific personalization, a technique to tailor local models to individual needs. We introduce Scafflix, a novel algorithm that efficiently integrates explicit personalization with local training. This innovative approach benefits from these two techniques, thereby achieving doubly accelerated communication, as we demonstrate both in theory and practice.

MFC · 廣義函數 · 統計效率 · 統計量 · 近似 ·

2023 年 5 月 18 日

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

Jiawei Huang,Batuhan Yardim,Niao He

from arxiv, 47 Pages

In this paper, we study the statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MBED), which subsumes a rich family of Mean-Field RL problems. Additionally, we propose algorithms based on Optimistic Maximal Likelihood Estimation, which can return an $\epsilon$-optimal policy for MFC or an $\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial w.r.t. relevant parameters and independent of the number of states, actions and the number of agents. Notably, our results only require a mild assumption of Lipschitz continuity on transition dynamics and avoid strong structural assumptions in previous work. Finally, in the tabular setting, given the access to a generative model, we establish an exponential lower bound for MFC setting, while providing a novel sample-efficient model elimination algorithm to approximate equilibrium in MFG setting. Our results reveal a fundamental separation between RL for single-agent, MFC, and MFG from the sample efficiency perspective.

Agent · INTERACT · 博弈論 · BASIC · 樣例 ·

2023 年 5 月 18 日

Game Theory with Simulation of Other Players

Vojtech Kovarik,Caspar Oesterheld,Vincent Conitzer

Game-theoretic interactions with AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to simulate an AI agent (for example because its source code is known), which allows others to accurately predict the agent's actions. This could lower the bar for trust and cooperation. In this paper, we formalize games in which one player can simulate another at a cost. We first derive some basic properties of such games and then prove a number of results for them, including: (1) introducing simulation into generic-payoff normal-form games makes them easier to solve; (2) if the only obstacle to cooperation is a lack of trust in the possibly-simulated agent, simulation enables equilibria that improve the outcome for both agents; and however (3) there are settings where introducing simulation results in strictly worse outcomes for both players.

INTERACT · 學成 · 潛在 · 可辨認的 · 機器人 ·

2020 年 11 月 12 日

Learning Latent Representations to Influence Multi-Agent Interaction

Annie Xie,Dylan P. Losey,Ryan Tolsma,Chelsea Finn,Dorsa Sadigh

from arxiv, Conference on Robot Learning (CoRL) 2020. Supplementary website at //sites.google.com/view/latent-strategies/

Seamlessly interacting with humans or robots is hard because these agents are non-stationary. They update their policy in response to the ego agent's behavior, and the ego agent must anticipate these changes to co-adapt. Inspired by humans, we recognize that robots do not need to explicitly model every low-level action another agent will make; instead, we can capture the latent strategy of other agents through high-level representations. We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy. The ego agent then leverages these latent dynamics to influence the other agent, purposely guiding them towards policies suitable for co-adaptation. Across several simulated domains and a real-world air hockey game, our approach outperforms the alternatives and learns to influence the other agent.