日韩在线精品小视频_亚洲高清日韩国产一区二区三区_国产综合激情一区二区在线_激情文学亚洲欧美另类图片_又粗又大又长又硬又黄又爽免费的视频_2021无码国产在线专区_午夜无遮挡男女啪啪视频

Multi-agent football poses an unsolved challenge in AI research. Existing work has focused on tackling simplified scenarios of the game, or else leveraging expert demonstrations. In this paper, we develop a multi-agent system to play the full 11 vs. 11 game mode, without demonstrations. This game mode contains aspects that present major challenges to modern reinforcement learning algorithms; multi-agent coordination, long-term planning, and non-transitivity. To address these challenges, we present TiZero; a self-evolving, multi-agent system that learns from scratch. TiZero introduces several innovations, including adaptive curriculum learning, a novel self-play strategy, and an objective that optimizes the policies of multiple agents jointly. Experimentally, it outperforms previous systems by a large margin on the Google Research Football environment, increasing win rates by over 30%. To demonstrate the generality of TiZero's innovations, they are assessed on several environments beyond football; Overcooked, Multi-agent Particle-Environment, Tic-Tac-Toe and Connect-Four.

相關內容

Self-Play

關注 0

分層強化學習 · 基元 · 分層 · 強化學習 · INFORMS ·

2023 年 4 月 7 日

CRISP: Curriculum inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning

Utsav Singh,Vinay P Namboodiri

Hierarchical reinforcement learning is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we propose a novel hierarchical algorithm by generating a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. The lower level primitive periodically performs data relabeling on a handful of expert demonstrations using our primitive informed parsing approach. We provide expressions to bound the sub-optimality of our method and develop a practical algorithm for hierarchical reinforcement learning. Since our approach uses a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluation on complex maze navigation and robotic manipulation environments show that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks.

遷移學習 · 檢測系統 · VGG · 數據集 · Xception ·

2023 年 4 月 7 日

Local Rose Breeds Detection System Using Transfer Learning Techniques

Amena Begum Farha,Md. Azizul Hakim,Mst. Eshita Khatun

from arxiv, 6 pages, 11 figures, conference

Flower breed detection and giving details of that breed with the suggestion of cultivation processes and the way of taking care is important for flower cultivation, breed invention, and the flower business. Among all the local flowers in Bangladesh, the rose is one of the most popular and demanded flowers. Roses are the most desirable flower not only in Bangladesh but also throughout the world. Roses can be used for many other purposes apart from decoration. As roses have a great demand in the flower business so rose breed detection will be very essential. However, there is no remarkable work for breed detection of a particular flower unlike the classification of different flowers. In this research, we have proposed a model to detect rose breeds from images using transfer learning techniques. For such work in flowers, resources are not enough in image processing and classification, so we needed a large dataset of the massive number of images to train our model. we have used 1939 raw images of five different breeds and we have generated 9306 images for the training dataset and 388 images for the testing dataset to validate the model using augmentation. We have applied four transfer learning models in this research, which are Inception V3, ResNet50, Xception, and VGG16. Among these four models, VGG16 achieved the highest accuracy of 99%, which is an excellent outcome. Breed detection of a rose by using transfer learning methods is the first work on breed detection of a particular flower that is publicly available according to the study.

博弈 · 廣義 · 最優 · 不確定 · 不確定性 ·

2023 年 4 月 6 日

Strategically revealing capabilities in General Lotto games

Keith Paarporn,Philip N. Brown

from arxiv, 8 pages

Can revealing one's competitive capabilities to an opponent offer strategic benefits? In this paper, we address this question in the context of General Lotto games, a class of two-player competitive resource allocation models. We consider an asymmetric information setting where the opponent is uncertain about the resource budget of the other player, and holds a prior belief on its value. We assume the other player, called the signaler, is able to send a noisy signal about its budget to the opponent. With its updated belief, the opponent then must decide to invest in costly resources that it will deploy against the signaler's resource budget in a General Lotto game. We derive the subgame perfect equilibrium to this extensive-form game. In particular, we identify necessary and sufficient conditions for which a signaling policy improves the signaler's resulting performance in comparison to the scenario where it does not send any signal. Moreover, we provide the optimal signaling policy when these conditions are met. Notably we find that for some scenarios, the signaler can effectively double its performance.

控制策略 · 機器人 · 結構 · 引擎 · 機器人控制 ·

2023 年 4 月 6 日

Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine

Kun Wang,William R. Johnson III,Shiyang Lu,Xiaonan Huang,Joran Booth,Rebecca Kramer-Bottiglio,Mridul Aanjaneya,Kostas Bekris

from arxiv, Submitted to IROS2023

Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for developing locomotion policies that can be transferred to real robots. Nevertheless, modeling tensegrity robots is a complex task due to a substantial sim2real gap. To address this issue, this paper describes a Real2Sim2Real (R2S2R) strategy for tensegrity robots. This strategy is based on a differentiable physics engine that can be trained given limited data from a real robot. These data include offline measurements of physical properties, such as mass and geometry for various robot components, and the observation of a trajectory using a random control policy. With the data from the real robot, the engine can be iteratively refined and used to discover locomotion policies that are directly transferable to the real robot. Beyond the R2S2R pipeline, key contributions of this work include computing non-zero gradients at contact points, a loss function for matching tensegrity locomotion gaits, and a trajectory segmentation technique that avoids conflicts in gradient evaluation during training. Multiple iterations of the R2S2R process are demonstrated and evaluated on a real 3-bar tensegrity robot.

并行化 · 并行 · 體模 · 自適應 · 代理模型 ·

2023 年 4 月 4 日

Adaptive parallelization of multi-agent simulations with localized dynamics

Alexandru-Ionu? B?beanu,Tatiana Filatova,Jan H. Kwakkel,Neil Yorke-Smith

from arxiv, 12 pages, 3 figures; work presented at the 24th International Workshop on Multi-Agent-Based Simulation

Agent-based modelling constitutes a versatile approach to representing and simulating complex systems. Studying large-scale systems is challenging because of the computational time required for the simulation runs: scaling is at least linear in system size (number of agents). Given the inherently modular nature of MABSs, parallel computing is a natural approach to overcoming this challenge. However, because of the shared information and communication between agents, parellelization is not simple. We present a protocol for shared-memory, parallel execution of MABSs. This approach is useful for models that can be formulated in terms of sequential computations, and that involve updates that are localized, in the sense of involving small numbers of agents. The protocol has a bottom-up and asynchronous nature, allowing it to deal with heterogeneous computation in an adaptive, yet graceful manner. We illustrate the potential performance gains on exemplar cultural dynamics and disease spreading MABSs.

自適應系統 · 可控 · 視頻生成 · 視頻 · 系統 ·

2023 年 4 月 4 日

Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE

Yucheng Xu,Li Nanbo,Arushi Goel,Zijian Guo,Zonghai Yao,Hamidreza Kasaei,Mohammadreze Kasaei,Zhibin Li

Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework leverages the ability of Neural Ordinary Differential Equations~(Neural ODEs) to represent complex dynamical systems as a set of nonlinear ordinary differential equations. The resulting framework is capable of generating videos with both desired dynamics and content. Experiments demonstrate the ability of the proposed method in generating highly controllable and visually consistent videos, and its capability of modeling dynamical systems. Overall, this work is a significant step towards developing advanced controllable video generation models that can handle complex and dynamic scenes.

Taxonomy · 學成 · 簇 · Performer · 秩 ·

2021 年 1 月 25 日

Curriculum Learning: A Survey

Petru Soviany,Radu Tudor Ionescu,Paolo Rota,Nicu Sebe

Training machine learning models in a meaningful order, from the easy samples to the hard ones, using curriculum learning can provide performance improvements over the standard training approach based on random data shuffling, without any additional computational costs. Curriculum learning strategies have been successfully employed in all areas of machine learning, in a wide range of tasks. However, the necessity of finding a way to rank the samples from easy to hard, as well as the right pacing function for introducing more difficult data can limit the usage of the curriculum approaches. In this survey, we show how these limits have been tackled in the literature, and we present different curriculum learning instantiations for various tasks in machine learning. We construct a multi-perspective taxonomy of curriculum learning approaches by hand, considering various classification criteria. We further build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm, linking the discovered clusters with our taxonomy. At the end, we provide some interesting directions for future work.

學成 · 強化學習 · 回合 · INTERACT · Next ·

2020 年 3 月 10 日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Sanmit Narvekar,Bei Peng,Matteo Leonetti,Jivko Sinapov,Matthew E. Taylor,Peter Stone

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.

可辨認的 · INTERACT · 可理解性 · 置信度 · AI ·

2019 年 5 月 13 日

Challenges in Building Intelligent Open-domain Dialog Systems

Minlie Huang,Xiaoyan Zhu,Jianfeng Gao

There is a resurgent interest in developing intelligent open-domain dialog systems due to the availability of large amounts of conversational data and the recent progress on neural approaches to conversational AI. Unlike traditional task-oriented bots, an open-domain dialog system aims to establish long-term connections with users by satisfying the human need for communication, affection, and social belonging. This paper reviews the recent works on neural approaches that are devoted to addressing three challenges in developing such systems: semantics, consistency, and interactiveness. Semantics requires a dialog system to not only understand the content of the dialog but also identify user's social needs during the conversation. Consistency requires the system to demonstrate a consistent personality to win users trust and gain their long-term confidence. Interactiveness refers to the system's ability to generate interpersonal responses to achieve particular social goals such as entertainment, conforming, and task completion. The works we select to present here is based on our unique views and are by no means complete. Nevertheless, we hope that the discussion will inspire new research in developing more intelligent dialog systems.