国产免费一区二区三区在线能观看-啦啦啦免费在线观看中文视频

Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability. Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium -- each agent policy being a best response to the other agents -- is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers. In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available. This requires relying on a simulation-based POMDP solver to construct an agent's FSC node by node. A related process is used to heuristically derive initial FSCs. Experiment with benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers, even better than many offline methods using explicit models.

相關內容

Agent

關注 15

優化器 · Agent · Learning · 可交換的 · INFORMS ·

2023 年 7 月 6 日

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Yuchen Fang,Zhenggang Tang,Kan Ren,Weiqing Liu,Li Zhao,Jiang Bian,Dongsheng Li,Weinan Zhang,Yong Yu,Tie-Yan Liu

from arxiv, Accepted in KDD 2023; The website is at //seqml.github.io/marl4fin

Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.

納什均衡 · 近似 · 穩健性 · 總回報 · 博弈論 ·

2023 年 7 月 6 日

A Robust Characterization of Nash Equilibrium

Florian Brandl,Felix Brandt

We give a robust characterization of Nash equilibrium by postulating coherent behavior across varying games: Nash equilibrium is the only solution concept that satisfies consequentialism, consistency, and rationality. As a consequence, every equilibrium refinement violates at least one of these properties. We moreover show that every solution concept that approximately satisfies consequentialism, consistency, and rationality returns approximate Nash equilibria. The latter approximation can be made arbitrarily good by increasing the approximation of the axioms. This result extends to various natural subclasses of games such as two-player zero-sum games, potential games, and graphical games.

回合 · 機器人 · Agent · 帶符號距離 · Wireless Networks ·

2023 年 7 月 5 日

Decentralized Planning for Car-Like Robotic Swarm in Cluttered Environments

Changjia Ma,Zhichao Han,Tingrui Zhang,Jingping Wang,Long Xu,Chengyang Li,Chao Xu,Fei Gao

from arxiv, Accepted by IROS 2023

Robot swarm is a hot spot in robotic research community. In this paper, we propose a decentralized framework for car-like robotic swarm which is capable of real-time planning in cluttered environments. In this system, path finding is guided by environmental topology information to avoid frequent topological change, and search-based speed planning is leveraged to escape from infeasible initial value's local minima. Then spatial-temporal optimization is employed to generate a safe, smooth and dynamically feasible trajectory. During optimization, the trajectory is discretized by fixed time steps. Penalty is imposed on the signed distance between agents to realize collision avoidance, and differential flatness cooperated with limitation on front steer angle satisfies the non-holonomic constraints. With trajectories broadcast to the wireless network, agents are able to check and prevent potential collisions. We validate the robustness of our system in simulation and real-world experiments. Code will be released as open-source packages.

dynamic programming · 控制器 · 優化器 · 極大 · 價值函數 ·

2023 年 7 月 4 日

Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents

Matthias Killer,Marius Wiggert,Hanna Krasowski,Manan Doshi,Pierre F. J. Lermusiaux,Claire J. Tomlin

from arxiv, 8 pages, submitted to 2023 IEEE 62th Annual Conference on Decision and Control (CDC) Matthias Killer and Marius Wiggert contributed equally to this work

Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth regions. The complex dynamics and underactuation make this challenging even when the currents are known. This is even harder when only short-term imperfect forecasts with increasing uncertainty are available. We propose a dynamic programming-based method to efficiently solve for the optimal growth value function when true currents are known. We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. We evaluate our approach through 30-day simulations of floating seaweed farms in realistic Pacific Ocean current scenarios. Our method demonstrates an achievement of 95.8% of the best possible growth using only 5-day forecasts. This confirms the feasibility of using low-power propulsion and optimal control for enhanced seaweed growth on floating farms under real-world conditions.

前向 · Learning · 機器人 · MoDELS · 強化學習 ·

2023 年 7 月 4 日

Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model

Yu'an Chen,Ruosong Ye,Ziyang Tao,Hongjian Liu,Guangda Chen,Jie Peng,Jun Ma,Yu Zhang,Jianmin Ji,Yanyong Zhang

Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, by directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation and thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we reduce the dimensions of the action space and improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. Experiments in various unknown environments demonstrate the effectiveness of AFST.

線性的 · RE · 基 · 線性組合 · 講稿 ·

2023 年 7 月 3 日

Linear multistep methods with repeated global Richardson

Imre Fekete,Lajos Lóczi

In this work, we further investigate the application of the well-known Richardson extrapolation (RE) technique to accelerate the convergence of sequences resulting from linear multistep methods (LMMs) for numerically solving initial-value problems of systems of ordinary differential equations. By extending the ideas of our previous paper, we now utilize some advanced versions of RE in the form of repeated RE (RRE). Assume that the underlying LMM -- the base method -- has order $p$ and RE is applied $l$ times. Then we prove that the accelerated sequence has convergence order $p+l$. The version we present here is global RE (GRE, also known as passive RE), since the terms of the linear combinations are calculated independently. Thus, the resulting higher-order LMM-RGRE methods can be implemented in a parallel fashion and existing LMM codes can directly be used without any modification. We also investigate how the linear stability properties of the base method (e.g. $A$- or $A(\alpha)$-stability) are preserved by the LMM-RGRE methods.

穩健性 · Extensibility · domain shift · Performer · HTTPS ·

2023 年 7 月 3 日

Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration

Kemal Oksuz,Tom Joy,Puneet K. Dokania

from arxiv, CVPR 2023

The current approach for testing the robustness of object detectors suffers from serious deficiencies such as improper methods of performing out-of-distribution detection and using calibration metrics which do not consider both localisation and classification quality. In this work, we address these issues, and introduce the Self-Aware Object Detection (SAOD) task, a unified testing framework which respects and adheres to the challenges that object detectors face in safety-critical environments such as autonomous driving. Specifically, the SAOD task requires an object detector to be: robust to domain shift; obtain reliable uncertainty estimates for the entire scene; and provide calibrated confidence scores for the detections. We extensively use our framework, which introduces novel metrics and large scale test datasets, to test numerous object detectors in two different use-cases, allowing us to highlight critical insights into their robustness performance. Finally, we introduce a simple baseline for the SAOD task, enabling researchers to benchmark future proposed methods and move towards robust object detectors which are fit for purpose. Code is available at //github.com/fiveai/saod

binary · 優化器 · 蒙特卡羅 · 泛函 · MCMC ·

2023 年 7 月 3 日

Monte Carlo Policy Gradient Method for Binary Optimization

Cheng Chen,Ruitao Chen,Tianyou Li,Ruichen Ao,Zaiwen Wen

Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the gradient efficiently. We further develop a filter scheme to replace the original objective function by the one with the local search technique to broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality for MCMC. Numerical results show that this framework is very promising to provide near-optimal solutions for quite a few binary optimization problems.

Agent · 得分 · 優化器 · binary · motivation ·

2023 年 6 月 30 日

U-Calibration: Forecasting for an Unknown Agent

Robert Kleinberg,Renato Paes Leme,Jon Schneider,Yifeng Teng

from arxiv, Accepted for presentation at the Conference on Learning Theory (COLT) 2023

We consider the problem of evaluating forecasts of binary events whose predictions are consumed by rational agents who take an action in response to a prediction, but whose utility is unknown to the forecaster. We show that optimizing forecasts for a single scoring rule (e.g., the Brier score) cannot guarantee low regret for all possible agents. In contrast, forecasts that are well-calibrated guarantee that all agents incur sublinear regret. However, calibration is not a necessary criterion here (it is possible for miscalibrated forecasts to provide good regret guarantees for all possible agents), and calibrated forecasting procedures have provably worse convergence rates than forecasting procedures targeting a single scoring rule. Motivated by this, we present a new metric for evaluating forecasts that we call U-calibration, equal to the maximal regret of the sequence of forecasts when evaluated under any bounded scoring rule. We show that sublinear U-calibration error is a necessary and sufficient condition for all agents to achieve sublinear regret guarantees. We additionally demonstrate how to compute the U-calibration error efficiently and provide an online algorithm that achieves $O(\sqrt{T})$ U-calibration error (on par with optimal rates for optimizing for a single scoring rule, and bypassing lower bounds for the traditionally calibrated learning procedures). Finally, we discuss generalizations to the multiclass prediction setting.

學成 · 強化學習 · 回合 · INTERACT · Next ·

2020 年 3 月 10 日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Sanmit Narvekar,Bei Peng,Matteo Leonetti,Jivko Sinapov,Matthew E. Taylor,Peter Stone

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.