成年人日屄视频免费观看_无码人妻一区二区三区在线不卡_高清国产三级在线播放_国产男女无套内谢免费视频_久久无码专区外国精品_日韩中文字幕一区二区三区久久_国产午夜福利免费看不卡

This paper studies the finite-time horizon Markov games where the agents' dynamics are decoupled but the rewards can possibly be coupled across agents. The policy class is restricted to local policies where agents make decisions using their local state. We first introduce the notion of smooth Markov games which extends the smoothness argument for normal form games to our setting, and leverage the smoothness property to bound the price of anarchy of the Markov game. For a specific type of Markov game called the Markov potential game, we also develop a distributed learning algorithm, multi-agent soft policy iteration (MA-SPI), which provably converges to a Nash equilibrium. Sample complexity of the algorithm is also provided. Lastly, our results are validated using a dynamic covering game.

相關內容

馬爾科夫

關注 0

策略迭代 · Markov · Learning · 線性的 · 強化學習 ·

2023 年 5 月 25 日

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Anna Winnicki,R. Srikant

from arxiv, 15 pages

Many model-based reinforcement learning (RL) algorithms can be viewed as having two phases that are iteratively implemented: a learning phase where the model is approximately learned and a planning phase where the learned model is used to derive a policy. In the case of standard MDPs, the learning problem can be solved using either value iteration or policy iteration. However, in the case of zero-sum Markov games, there is no efficient policy iteration algorithm; e.g., it has been shown in Hansen et al. (2013) that one has to solve Omega(1/(1-alpha)) MDPs, where alpha is the discount factor, to implement the only known convergent version of policy iteration. Another algorithm for Markov zero-sum games, called naive policy iteration, is easy to implement but is only provably convergent under very restrictive assumptions. Prior attempts to fix naive policy iteration algorithm have several limitations. Here, we show that a simple variant of naive policy iteration for games converges, and converges exponentially fast. The only addition we propose to naive policy iteration is the use of lookahead in the policy improvement phase. This is appealing because lookahead is anyway often used in RL for games. We further show that lookahead can be implemented efficiently in linear Markov games, which are the counterpart of the linear MDPs and have been the subject of much attention recently. We then consider multi-agent reinforcement learning which uses our algorithm in the planning phases, and provide sample and time complexity bounds for such an algorithm.

噪聲 · 控制器 · Processing（編程語言） · 估計/估計量 · Learning ·

2023 年 5 月 25 日

Gaussian Processes with State-Dependent Noise for Stochastic Control

Marcel Menner,Karl Berntorp

This paper considers a stochastic control framework, in which the residual model uncertainty of the dynamical system is learned using a Gaussian Process (GP). In the proposed formulation, the residual model uncertainty consists of a nonlinear function and state-dependent noise. The proposed formulation uses a posterior-GP to approximate the residual model uncertainty and a prior-GP to account for state-dependent noise. The two GPs are interdependent and are thus learned jointly using an iterative algorithm. Theoretical properties of the iterative algorithm are established. Advantages of the proposed state-dependent formulation include (i) faster convergence of the GP estimate to the unknown function as the GP learns which data samples are more trustworthy and (ii) an accurate estimate of state-dependent noise, which can, e.g., be useful for a controller or decision-maker to determine the uncertainty of an action. Simulation studies highlight these two advantages.

state-of-the-art · Notability · SimPLe · 散度 · prototype ·

2023 年 5 月 25 日

Solving Infinite-State Games via Acceleration

Philippe Heim,Rayna Dimitrova

Two-player graph games have found numerous applications, most notably the synthesis of reactive systems from temporal specifications, where they are successfully used to generate finite-state systems. Due to their relevance for practical applications, reactive synthesis of infinite-state systems, and hence the need for techniques for solving infinite-state games, have attracted attention in recent years. We propose novel semi-algorithms for solving infinite-state games with $\omega$-regular winning conditions. The novelty lies in the utilization of an acceleration technique that is helpful in avoiding divergence. The key idea is to enhance the game-solving algorithm with the ability to use combinations of simple inductive arguments in order to summarize unbounded iterations of strategic decisions in the game. This enables our method to solve games on which state-of-the-art techniques diverge, as we demonstrate in the evaluation of a prototype implementation.

Processing（編程語言） · Markov · 回合 · 易處理的 · 策略迭代 ·

2023 年 5 月 25 日

Markov Decision Process with an External Temporal Process

Ranga Shaarad Ayyagari,Ambedkar Dukkipati

Most reinforcement learning algorithms treat the context under which they operate as a stationary, isolated and undisturbed environment. However, in the real world, the environment is constantly changing due to a variety of external influences. To address this problem, we study Markov Decision Processes (MDP) under the influence of an external temporal process. We formalize this notion and discuss conditions under which the problem becomes tractable with suitable solutions. We propose a policy iteration algorithm to solve this problem and theoretically analyze its performance.

Markovian · 噪聲 · 優化器 · 混合時間 · Extensibility ·

2023 年 5 月 25 日

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Aleksandr Beznosikov,Sergey Samsonov,Marina Sheshukova,Alexander Gasnikov,Alexey Naumov,Eric Moulines

from arxiv, 47 pages, 3 algorithms, 2 tables

This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the underlying noise sequence, we use the randomized batching scheme, which is based on the multilevel Monte Carlo method. Moreover, our technique allows us to eliminate the limiting assumptions of previous research on Markov noise, such as the need for a bounded domain and uniformly bounded stochastic gradients. Our extension to variational inequalities under Markovian noise is original. Additionally, we provide lower bounds that match the oracle complexity of our method in the case of strongly convex optimization problems.

CC · 博弈論 · Lipschitz連續 · Continuity · Lipschitz ·

2023 年 5 月 25 日

The Computational Complexity of Multi-player Concave Games and Kakutani Fixed Points

Christos H. Papadimitriou,Emmanouil-Vasileios Vlatakis-Gkaragkounis,Manolis Zampetakis

Kakutani's Fixed Point theorem is a fundamental theorem in topology with numerous applications in game theory and economics. Computational formulations of Kakutani exist only in special cases and are too restrictive to be useful in reductions. In this paper, we provide a general computational formulation of Kakutani's Fixed Point Theorem and we prove that it is PPAD-complete. As an application of our theorem we are able to characterize the computational complexity of the following fundamental problems: (1) Concave Games. Introduced by the celebrated works of Debreu and Rosen in the 1950s and 60s, concave $n$-person games have found many important applications in Economics and Game Theory. We characterize the computational complexity of finding an equilibrium in such games. We show that a general formulation of this problem belongs to PPAD, and that finding an equilibrium is PPAD-hard even for a rather restricted games of this kind: strongly-concave utilities that can be expressed as multivariate polynomials of a constant degree with axis aligned box constraints. (2) Walrasian Equilibrium. Using Kakutani's fixed point Arrow and Debreu we resolve an open problem related to Walras's theorem on the existence of price equilibria in general economies. There are many results about the PPAD-hardness of Walrasian equilibria, but the inclusion in PPAD is only known for piecewise linear utilities. We show that the problem with general convex utilities is in PPAD. Along the way we provide a Lipschitz continuous version of Berge's maximum theorem that may be of independent interest.

Markov · 近似 · Analysis · INTERACT · CASES ·

2023 年 5 月 24 日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Xin Guo,Xinyu Li,Chinmay Maheshwari,Shankar Sastry,Manxi Wu

from arxiv, 26 pages, 3 figures

This paper proposes a new framework to study multi-agent interaction in Markov games: Markov $\alpha$-potential games. Markov potential games are special cases of Markov $\alpha$-potential games, so are two important and practically significant classes of games: Markov congestion games and perturbed Markov team games. In this paper, {$\alpha$-potential} functions for both games are provided and the gap $\alpha$ is characterized with respect to game parameters. Two algorithms -- the projected gradient-ascent algorithm and the sequential maximum improvement smoothed best response dynamics -- are introduced for approximating the stationary Nash equilibrium in Markov $\alpha$-potential games. The Nash-regret for each algorithm is shown to scale sub-linearly in time horizon. Our analysis and numerical experiments demonstrates that simple algorithms are capable of finding approximate equilibrium in Markov $\alpha$-potential games.

優化器 · Cocoa · 賭博機/老虎機 · TEAM · 約束優化 ·

2023 年 5 月 24 日

Concurrent Constrained Optimization of Unknown Rewards for Multi-Robot Task Allocation

Sukriti Singh,Anusha Srikanthan,Vivek Mallampati,Harish Ravichandar

from arxiv, 9 pages, 5 figures, to be published in RSS 2023

Task allocation can enable effective coordination of multi-robot teams to accomplish tasks that are intractable for individual robots. However, existing approaches to task allocation often assume that task requirements or reward functions are known and explicitly specified by the user. In this work, we consider the challenge of forming effective coalitions for a given heterogeneous multi-robot team when task reward functions are unknown. To this end, we first formulate a new class of problems, dubbed COncurrent Constrained Online optimization of Allocation (COCOA). The COCOA problem requires online optimization of coalitions such that the unknown rewards of all the tasks are simultaneously maximized using a given multi-robot team with constrained resources. To address the COCOA problem, we introduce an online optimization algorithm, named Concurrent Multi-Task Adaptive Bandits (CMTAB), that leverages and builds upon continuum-armed bandit algorithms. Experiments involving detailed numerical simulations and a simulated emergency response task reveal that CMTAB can effectively trade-off exploration and exploitation to simultaneously and efficiently optimize the unknown task rewards while respecting the team's resource constraints.

MoDELS · 噪聲 · Performer · state-of-the-art · 特化 ·

2023 年 5 月 24 日

Removing Structured Noise with Diffusion Models

Tristan S. W. Stevens,Hans van Gorp,Faik C. Meral,Jun Seob Shin,Jason Yu,Jean-Luc Robert,Ruud J. G. van Sloun

from arxiv, 24 pages, 22 figures, preprint

Solving ill-posed inverse problems requires careful formulation of prior beliefs over the signals of interest and an accurate description of their manifestation into noisy measurements. Handcrafted signal priors based on e.g. sparsity are increasingly replaced by data-driven deep generative models, and several groups have recently shown that state-of-the-art score-based diffusion models yield particularly strong performance and flexibility. In this paper, we show that the powerful paradigm of posterior sampling with diffusion models can be extended to include rich, structured, noise models. To that end, we propose a joint conditional reverse diffusion process with learned scores for the noise and signal-generating distribution. We demonstrate strong performance gains across various inverse problems with structured noise, outperforming competitive baselines that use normalizing flows and adversarial networks. This opens up new opportunities and relevant practical applications of diffusion modeling for inverse problems in the context of non-Gaussian measurement models.

Better · Conformer · 評論員 · 統計量 · ADE ·

2023 年 5 月 24 日

Physics Constrained Motion Prediction with Uncertainty Quantification

Renukanandan Tumu,Lars Lindemann,Truong Nghiem,Rahul Mangharam

from arxiv, Accepted at IV 2023

Predicting the motion of dynamic agents is a critical task for guaranteeing the safety of autonomous systems. A particular challenge is that motion prediction algorithms should obey dynamics constraints and quantify prediction uncertainty as a measure of confidence. We present a physics-constrained approach for motion prediction which uses a surrogate dynamical model to ensure that predicted trajectories are dynamically feasible. We propose a two-step integration consisting of intent and trajectory prediction subject to dynamics constraints. We also construct prediction regions that quantify uncertainty and are tailored for autonomous driving by using conformal prediction, a popular statistical tool. Physics Constrained Motion Prediction achieves a 41% better ADE, 56% better FDE, and 19% better IoU over a baseline in experiments using an autonomous racing dataset.