精品夜色国产国偷自产乱码_亚洲国产日韩欧精品一区二区三区_99久久精品费精品国产一区区_国产ā片在线观看免费观看_国产在沙发上午睡被强视频_91亚洲精品在看在线观看高清_欧美成人影院在线观看

We consider in discrete time, a general class of sequential stochastic dynamic games with asymmetric information with the following features. The underlying system has Markovian dynamics controlled by the agents' joint actions. Each agent's instantaneous utility depends on the current system state and the agents' joint actions. At each time instant each agent makes a private noisy observation of the current system state and the agents' actions in the previous time instant. In addition, at each time instant all agents have a common noisy observation of the current system state and their actions in the previous time instant. Each agent's actions are part of his private information. The objective is to determine Bayesian Nash Equilibrium (BNE) strategy profiles that are based on a compressed version of the agents' information and can be sequentially computed; such BNE strategy profiles may not always exist. We present an approach/methodology that achieves the above-stated objective, along with an instance of a game where BNE strategy profiles with the above-mentioned characteristics exist. We show that the methodology also works for the case where the agents have no common observations.

相關內容

INFORMS

關注 10

《計算機信息》雜志發表高質量的論文，擴大了運籌學和計算的范圍，尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文，以及描述新的和有用的軟件工具的論文。官網鏈接： · 穩健性 · 優化器 · 設計 · 易處理的 ·

2023 年 3 月 9 日

Robust Optimization Approach to Information Design in Linear-Quadratic-Gaussian Games

Furkan Sezer,Ceyhun Eksin

Information design in an incomplete information game includes a designer with the goal of influencing players' actions through signals generated from a designed probability distribution so that its objective function is optimized. If the players have quadratic payoffs that depend on the players' actions and an unknown payoff-relevant state, and signals on the state that follow a Gaussian distribution conditional on the state realization, then the information design problem under quadratic design objectives is a semidefinite program (SDP). We consider a setting in which the designer has partial knowledge on agents' utilities. We address the uncertainty about players' preferences by formulating a robust information design problem. Specifically, we consider ellipsoid perturbations over payoff matrices in linear-quadratic-Gaussian (LQG) games. We show that this leads to a tractable robust SDP formulation. Using the robust SDP formulation, we obtain analytical conditions for the optimality of no information and full information disclosure. The robust convex program is also extended to interval and general convex cone uncertainty sets on the payoff matrices. Numerical studies are carried out to identify the relation between the perturbation levels and the optimal information structures.

Learning · Markov · Performer · SimPLe · 情景 ·

2023 年 3 月 9 日

Learning Strategic Value and Cooperation in Multi-Player Stochastic Games through Side Payments

Alan Kuhnle,Jeffrey Richley,Darleen Perez-Lavin

from arxiv, 8 pages in main work, 6 figures, ICML submission

For general-sum, n-player, strategic games with transferable utility, the Harsanyi-Shapley value provides a computable method to both 1) quantify the strategic value of a player; and 2) make cooperation rational through side payments. We give a simple formula to compute the HS value in normal-form games. Next, we provide two methods to generalize the HS values to stochastic (or Markov) games, and show that one of them may be computed using generalized Q-learning algorithms. Finally, an empirical validation is performed on stochastic grid-games with three or more players. Source code is provided to compute HS values for both the normal-form and stochastic game setting.

INFORMS · 情景 · 相關系數 · 可理解性 · 泛化理論 ·

2023 年 3 月 8 日

Information Spillover in Multiple Zero-sum Games

Lucas Pahl

This paper considers an infinitely repeated three-player Bayesian game with lack of information on two sides, in which an informed player plays two zero-sum games simultaneously at each stage against two uninformed players. This is a generalization of the Aumann et al. [1] two-player zero-sum one-sided incomplete information model. Under a correlated prior, the informed player faces the problem of how to optimally disclose information among two uninformed players in order to maximize his long-term average payoffs. Our objective is to understand the adverse effects of \information spillover" from one game to the other in the equilibrium payoff set of the informed player. We provide conditions under which the informed player can fully overcome such adverse effects and characterize equilibrium payoffs. In a second result, we show how the effects of information spillover on the equilibrium payoff set of the informed player might be severe.

估計/估計量 · 狀態估計 · Markov · INFORMS · Performer ·

2023 年 3 月 8 日

Remote Monitoring of Two-State Markov Sources via Random Access Channels: an Information Freshness vs. State Estimation Entropy Perspective

Giuseppe Cocco,Andrea Munari,Gianluigi Liva

We study a system in which two-state Markov sources send status updates to a common receiver over a slotted ALOHA random access channel. We characterize the performance of the system in terms of state estimation entropy (SEE), which measures the uncertainty at the receiver about the sources' state. Two channel access strategies are considered, a reactive policy that depends on the source behavior and a random one that is independent of it. We prove that the considered policies can be studied using two different hidden Markov models (HMM) and show through density evolution (DE) analysis that the reactive strategy outperforms the random one in terms of SEE while the opposite is true for AoI. Furthermore, we characterize the probability of error in the state estimation at the receiver, considering a maximum a posteriori (MAP) estimator and a low-complexity (decode and hold) estimator. Our study provides useful insights on the design trade-offs that emerge when different performance metrics such as SEE, age or information (AoI) or state estimation probability error are adopted. Moreover, we show how the source statistics significantly impact the system performance.

Learning · 約束 · Continuity · INFORMS · Performer ·

2023 年 3 月 8 日

Using Memory-Based Learning to Solve Tasks with State-Action Constraints

Mrinal Verghese,Chris Atkeson

from arxiv, 8 pages, 3 figures, accepted to the International Conference on Robotics and Automation 2023

Tasks where the set of possible actions depend discontinuously on the state pose a significant challenge for current reinforcement learning algorithms. For example, a locked door must be first unlocked, and then the handle turned before the door can be opened. The sequential nature of these tasks makes obtaining final rewards difficult, and transferring information between task variants using continuous learned values such as weights rather than discrete symbols can be inefficient. Our key insight is that agents that act and think symbolically are often more effective in dealing with these tasks. We propose a memory-based learning approach that leverages the symbolic nature of constraints and temporal ordering of actions in these tasks to quickly acquire and transfer high-level information. We evaluate the performance of memory-based learning on both real and simulated tasks with approximately discontinuous constraints between states and actions, and show our method learns to solve these tasks an order of magnitude faster than both model-based and model-free deep reinforcement learning methods.

可辨認的 · INTERACT · Learning · 表示學習 · 表示 ·

2023 年 3 月 7 日

Causal Representation Learning for Instantaneous and Temporal Effects in Interactive Systems

Phillip Lippe,Sara Magliacane,Sindy L?we,Yuki M. Asano,Taco Cohen,Efstratios Gavves

from arxiv, Published at International Conference on Learning Representations (ICLR), 2023

Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that allows for instantaneous effects in intervened temporal sequences when intervention targets can be observed, e.g., as actions of an agent. iCITRIS identifies the potentially multidimensional causal variables from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three datasets of interactive systems, iCITRIS accurately identifies the causal variables and their causal graph.

INTERACT · Learning · 推斷 · MoDELS · Analysis ·

2023 年 3 月 7 日

Learning particle swarming models from data with Gaussian processes

Jinchao Feng,Charles Kulick,Yunxiang Ren,Sui Tang

from arxiv, 44 pages; Appendix 5 pages

Interacting particle or agent systems that display a rich variety of swarming behaviours are ubiquitous in science and engineering. A fundamental and challenging goal is to understand the link between individual interaction rules and swarming. In this paper, we study the data-driven discovery of a second-order particle swarming model that describes the evolution of $N$ particles in $\mathbb{R}^d$ under radial interactions. We propose a learning approach that models the latent radial interaction function as Gaussian processes, which can simultaneously fulfill two inference goals: one is the nonparametric inference of {the} interaction function with pointwise uncertainty quantification, and the other one is the inference of unknown scalar parameters in the non-collective friction forces of the system. We formulate the learning problem as a statistical inverse problem and provide a detailed analysis of recoverability conditions, establishing that a coercivity condition is sufficient for recoverability. Given data collected from $M$ i.i.d trajectories with independent Gaussian observational noise, we provide a finite-sample analysis, showing that our posterior mean estimator converges in a Reproducing kernel Hilbert space norm, at an optimal rate in $M$ equal to the one in the classical 1-dimensional Kernel Ridge regression. As a byproduct, we show we can obtain a parametric learning rate in $M$ for the posterior marginal variance using $L^{\infty}$ norm, and the rate could also involve $N$ and $L$ (the number of observation time instances for each trajectory), depending on the condition number of the inverse problem. Numerical results on systems that exhibit different swarming behaviors demonstrate efficient learning of our approach from scarce noisy trajectory data.

Learning · 表示學習 · 噪聲 · 表示 · Markov ·

2023 年 3 月 7 日

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

Tongzheng Ren,Tianjun Zhang,Csaba Szepesvári,Bo Dai

from arxiv, UAI 2022. The first two authors contribute equally

Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality. However, the power of representation learning has not been fully exploited yet in reinforcement learning (RL), due to i), the trade-off between expressiveness and tractability; and ii), the coupling between exploration and representation learning. In this paper, we first reveal the fact that under some noise assumption in the stochastic control model, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free. Based on this observation, we propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise. We provide rigorous theoretical analysis of SPEDE, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.

Processing（編程語言） · ARM · 主動學習 · 情景 · 美國國家航空航天局（NASA） ·

2023 年 3 月 7 日

An Active Learning Based Robot Kinematic Calibration Framework Using Gaussian Processes

Ersin Da?,Joel W. Burdick

Future NASA lander missions to icy moons will require completely automated, accurate, and data efficient calibration methods for the robot manipulator arms that sample icy terrains in the lander's vicinity. To support this need, this paper presents a Gaussian Process (GP) approach to the classical manipulator kinematic calibration process. Instead of identifying a corrected set of Denavit-Hartenberg kinematic parameters, a set of GPs models the residual kinematic error of the arm over the workspace. More importantly, this modeling framework allows a Gaussian Process Upper Confident Bound (GP-UCB) algorithm to efficiently and adaptively select the calibration's measurement points so as to minimize the number of experiments, and therefore minimize the time needed for recalibration. The method is demonstrated in simulation on a simple 2-DOF arm, a 6 DOF arm whose geometry is a candidate for a future NASA mission, and a 7 DOF Barrett WAM arm.

Learning · Atari · Markov · 深度強化學習 · Agent ·

2023 年 3 月 7 日

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

Zihan Ding,Dijia Su,Qinghua Liu,Chi Jin

This paper proposes new, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Different from prior efforts on training agents to beat a fixed set of opponents, our objective is to find the Nash equilibrium policies that are free from exploitation by even the adversarial opponents. We propose (a) Nash-DQN algorithm, which integrates the deep learning techniques from single DQN into the classic Nash Q-learning algorithm for solving tabular Markov games; (b) Nash-DQN-Exploiter algorithm, which additionally adopts an exploiter to guide the exploration of the main agent. We conduct experimental evaluation on tabular examples as well as various two-player Atari games. Our empirical results demonstrate that (i) the policies found by many existing methods including Neural Fictitious Self Play and Policy Space Response Oracle can be prone to exploitation by adversarial opponents; (ii) the output policies of our algorithms are robust to exploitation, and thus outperform existing methods.