又大又硬又长又粗免费看_毛片在线观看网站_成人免费做羞羞事视频频_色欲欧美日本高清一区二区三区_天天干天天日天天碰天天射_伦理片午夜在线视频_国产A天天免费观看美女

Evolutionary game theory provides a mathematical foundation for cross-disciplinary fertilization, especially for integrating ideas from artificial intelligence and game theory. Such integration offers a transparent and rigorous approach to complex decision-making problems in a variety of important contexts, ranging from evolutionary computation to machine behavior. Despite the astronomically huge individual behavioral strategy space for interactions in the iterated Prisoner's Dilemma (IPD) games, the so-called Zero-Determinant (ZD) strategies is a set of rather simple memory-one strategies yet can unilaterally set a linear payoff relationship between themselves and their opponent. Although the witting of ZD strategies gives players an upper hand in the IPD games, we find and characterize unbending strategies that can force ZD players to be fair in their own interest. Moreover, our analysis reveals the ubiquity of unbending properties in common IPD strategies which are previously overlooked. In this work, we demonstrate the important steering role of unbending strategies in fostering fairness and cooperation in pairwise interactions. Our results will help bring a new perspective by means of combining game theory and multi-agent learning systems for optimizing winning strategies that are robust to noises, errors, and deceptions in non-zero-sum games.

相關內容

自(zi)適應學習

關注 10

自適(shi)應學(xue)(xue)(xue)(xue)習(xi)，也(ye)被稱為(wei)自適(shi)應教(jiao)學(xue)(xue)(xue)(xue)，是使用計(ji)算(suan)(suan)機算(suan)(suan)法來協(xie)調(diao)與學(xue)(xue)(xue)(xue)習(xi)者的(de)互動，并提供(gong)定制學(xue)(xue)(xue)(xue)習(xi)資源和(he)學(xue)(xue)(xue)(xue)習(xi)活動來解(jie)決(jue)每(mei)個(ge)(ge)學(xue)(xue)(xue)(xue)習(xi)者的(de)獨特需求的(de)教(jiao)育方法。在專(zhuan)業的(de)學(xue)(xue)(xue)(xue)習(xi)情(qing)境，個(ge)(ge)人可(ke)以“試驗出”一(yi)些訓練(lian)方式，以確保(bao)教(jiao)學(xue)(xue)(xue)(xue)內容的(de)更新。根據學(xue)(xue)(xue)(xue)生(sheng)(sheng)的(de)學(xue)(xue)(xue)(xue)習(xi)需要(yao)，計(ji)算(suan)(suan)機生(sheng)(sheng)成(cheng)(cheng)適(shi)應其(qi)特點的(de)教(jiao)育材料，包(bao)括他們對問題(ti)的(de)回答和(he)完成(cheng)(cheng)的(de)任務和(he)經驗。該(gai)技術涵蓋了各個(ge)(ge)研究領域和(he)它們的(de)衍生(sheng)(sheng)，包(bao)括計(ji)算(suan)(suan)機科學(xue)(xue)(xue)(xue)、人工(gong)智(zhi)能、心理測(ce)驗、教(jiao)育學(xue)(xue)(xue)(xue)、心理學(xue)(xue)(xue)(xue)和(he)腦(nao)科學(xue)(xue)(xue)(xue)。

INTERACT · 情景 · Learning · 在線 · 回合 ·

2023 年 7 月 20 日

Leveraging Offline Data in Online Reinforcement Learning

Andrew Wagenmaker,Aldo Pacchiano

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $\epsilon$-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data. Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy? In this work, we consider this setting, which we call the \textsf{FineTuneRL} setting, for MDPs with linear structure. We characterize the necessary number of online samples needed in this setting given access to some offline dataset, and develop an algorithm, \textsc{FTPedel}, which is provably optimal, up to $H$ factors. We show through an explicit example that combining offline data with online interactions can lead to a provable improvement over either purely offline or purely online RL. Finally, our results illustrate the distinction between \emph{verifiable} learning, the typical setting considered in online RL, and \emph{unverifiable} learning, the setting often considered in offline RL, and show that there is a formal separation between these regimes.

優化器 · 控制器 · Projection · Learning · 回合 ·

2023 年 7 月 20 日

Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning

Can Jiang,Xin Li,Jia-Rui Lin,Ming Liu,Zhiliang Ma

Due to complexity and dynamics of construction work, resource, and cash flows, poor management of them usually leads to time and cost overruns, bankruptcy, even project failure. Existing approaches in construction failed to achieve optimal control of resource flow in a dynamic environment with uncertainty. Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects. First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resource, and cash flows as well as uncertainty and variability of diverse influence factors. Meanwhile, to efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize the continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows. To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios illustrate that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resource, and cash flows, and may serve as a step stone for adopting DRL technology in construction project management.

Learning · 感知機 · MoDELS · 狀態空間 · 離散化 ·

2023 年 7 月 19 日

The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Nishil Patel,Sebastian Lee,Stefano Sarao Mannelli,Sebastian Goldt,Adrew Saxe

from arxiv, 10 pages, 7 figures, Preprint

Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behaviour, including delayed learning under sparse rewards; a variety of learning regimes depending on reward baselines; and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade Learning Environment game "Pong" also show such a speed-accuracy trade-off in practice. Together, these results take a step towards closing the gap between theory and practice in high-dimensional RL.

泛化理論 · 優化器 · 非凸 · Analysis · 經驗風險 ·

2023 年 7 月 18 日

Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems

Yunwen Lei

from arxiv, To appear in COLT 2023

Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.

contrastive · Performer · 類別 · 圖像分割 · 損失 ·

2023 年 7 月 17 日

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

Chenyu You,Weicheng Dai,Yifei Min,Lawrence Staib,Jasjeet S. Sekhon,James S. Duncan

from arxiv, Accepted by International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2023)

Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $\tau$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $\tau$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.

約束 · 最優化 · 機器人 · Extensibility · 約束優化 ·

2023 年 7 月 17 日

Social Robot Navigation through Constrained Optimization: a Comparative Study of Uncertainty-based Objectives and Constraints

Timur Akhtyamov,Aleksandr Kashirin,Aleksey Postnikov,Gonzalo Ferrer

This work is dedicated to the study of how uncertainty estimation of the human motion prediction can be embedded into constrained optimization techniques, such as Model Predictive Control (MPC) for the social robot navigation. We propose several cost objectives and constraint functions obtained from the uncertainty of predicting pedestrian positions and related to the probability of the collision that can be applied to the MPC, and all the different variants are compared in challenging scenes with multiple agents. The main question this paper tries to answer is: what are the most important uncertainty-based criteria for social MPC? For that, we evaluate the proposed approaches with several social navigation metrics in an extensive set of scenarios of different complexity in reproducible synthetic environments. The main outcome of our study is a foundation for a practical guide on when and how to use uncertainty-aware approaches for social robot navigation in practice and what are the most effective criteria.

AIM · INFORMS · 不完美信息 · 控制器 · AI ·

2021 年 10 月 21 日

On games and simulators as a platform for development of artificial intelligence for command and control

Vinicius G. Goecks,Nicholas Waytowich,Derrik E. Asher,Song Jun Park,Mark Mittrick,John Richardson,Manuel Vindiola,Anne Logie,Mark Dennison,Theron Trout,Priya Narayanan,Alexander Kott

from arxiv, Preprint submitted to the Journal of Defense Modeling and Simulation (JDMS) for peer review

Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.

圖形處理器 · Weight · 學成 · 遷移學習 · Performer ·

2021 年 7 月 20 日

Adaptive Transfer Learning on Graph Neural Networks

Xueting Han,Zhenhuan Huang,Bang An,Jing Bai

Graph neural networks (GNNs) is widely used to learn a powerful representation of graph-structured data. Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation. However, there is an inherent gap between self-supervised tasks and downstream tasks in terms of optimization objective and training data. Conventional pre-training methods may be not effective enough on knowledge transfer since they do not make any adaptation for downstream tasks. To solve such problems, we propose a new transfer learning paradigm on GNNs which could effectively leverage self-supervised tasks as auxiliary tasks to help the target task. Our methods would adaptively select and combine different auxiliary tasks with the target task in the fine-tuning stage. We design an adaptive auxiliary loss weighting model to learn the weights of auxiliary tasks by quantifying the consistency between auxiliary tasks and the target task. In addition, we learn the weighting model through meta-learning. Our methods can be applied to various transfer learning approaches, it performs well not only in multi-task learning but also in pre-training and fine-tuning. Comprehensive experiments on multiple downstream tasks demonstrate that the proposed methods can effectively combine auxiliary tasks with the target task and significantly improve the performance compared to state-of-the-art methods.

contrastive · 圖 · 對比學習 · Performer · 學成 ·

2021 年 2 月 26 日

Graph Contrastive Learning with Adaptive Augmentation

Yanqiao Zhu,Yichen Xu,Feng Yu,Qiang Liu,Shu Wu,Liang Wang

from arxiv, Accepted to WWW 2021, authors' version. 12 pages, 3 figures, 5 tables. arXiv admin note: substantial text overlap with arXiv:2006.04131

Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes -- a crucial component in CL -- remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.

MoDELS · Machine Learning · 學成 · entity · 回合 ·

2021 年 1 月 6 日

Adaptive Synthetic Characters for Military Training

Volkan Ustun,Rajay Kumar,Adam Reilly,Seyed Sajjadi,Andrew Miller

Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.