亚洲AV午夜成人片精品网站听书_亚洲欧美中文日韩A_国产日韩精品视频二区一区_污污视频在线一区二区午夜_一区二区三区四区欧美精品_欧美亚洲日本性爱在线观看_久热这里只精品国产8

This paper studies a multi-robot visibility-based pursuit-evasion problem in which a group of pursuer robots are tasked with detecting an evader within a two dimensional polygonal environment. The primary contribution is a novel formulation of the pursuit-evasion problem that modifies the pursuers' objective by requiring that the evader still be detected, even in spite of the failure of any single pursuer robot. This novel constraint, whereby two pursuers are required to detect an evader, has the benefit of providing redundancy to the search, should any member of the team become unresponsive, suffer temporary sensor disruption/failure, or otherwise become incapacitated. Existing methods, even those that are designed to respond to failures, rely on the pursuers to replan and update their search pattern to handle such occurrences. In contrast, the proposed formulation produces plans that are inherently tolerant of some level of disturbance. Building upon this new formulation, we introduce an augmented data structure for encoding the problem state and a novel sampling technique to ensure that the generated plans are robust to failures of any single pursuer robot. An implementation and simulation results illustrating the effectiveness of this approach are described.

相關內容

contrastive

關注 1

學成 · 強化學習 · 機器人 · 稀疏 · Better ·

2021 年 11 月 11 日

Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations

Ozsel Kilinc,Giovanni Montana

from arxiv, To appear in ECML PKDD 2022

Mastering robotic manipulation skills through reinforcement learning (RL) typically requires the design of shaped reward functions. Recent developments in this area have demonstrated that using sparse rewards, i.e. rewarding the agent only when the task has been successfully completed, can lead to better policies. However, state-action space exploration is more difficult in this case. Recent RL approaches to learning with sparse rewards have leveraged high-quality human demonstrations for the task, but these can be costly, time consuming or even impossible to obtain. In this paper, we propose a novel and effective approach that does not require human demonstrations. We observe that every robotic manipulation task could be seen as involving a locomotion task from the perspective of the object being manipulated, i.e. the object could learn how to reach a target state on its own. In order to exploit this idea, we introduce a framework whereby an object locomotion policy is initially obtained using a realistic physics simulator. This policy is then used to generate auxiliary rewards, called simulated locomotion demonstration rewards (SLDRs), which enable us to learn the robot manipulation policy. The proposed approach has been evaluated on 13 tasks of increasing complexity, and can achieve higher success rate and faster learning rates compared to alternative algorithms. SLDRs are especially beneficial for tasks like multi-object stacking and non-rigid object manipulation.

回合 · 學成 · 端到端 · 控制器 · SimPLe ·

2021 年 11 月 11 日

Learning by Cheating : An End-to-End Zero Shot Framework for Autonomous Drone Navigation

Praveen Venkatesh,Viraj Shah,Vrutik Shah,Yash Kamble,Joycee Mekie

This paper proposes a novel framework for autonomous drone navigation through a cluttered environment. Control policies are learnt in a low-level environment during training and are applied to a complex environment during inference. The controller learnt in the training environment is tricked into believing that the robot is still in the training environment when it is actually navigating in a more complex environment. The framework presented in this paper can be adapted to reuse simple policies in more complex tasks. We also show that the framework can be used as an interpretation tool for reinforcement learning algorithms.

Performer · 控制器 · 學成 · 強化學習 · 價值函數 ·

2021 年 11 月 10 日

Scalable Reinforcement Learning Policies for Multi-Agent Control

Christopher D. Hsu,Heejin Jeong,George J. Pappas,Pratik Chaudhari

from arxiv, 8 pages, 10 figures, contributed paper at IROS 2021

We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking. Our method can handle an arbitrary number of pursuers and targets; we show results for tasks consisting up to 1000 pursuers tracking 1000 targets. We use a decentralized, partially-observable Markov Decision Process framework to model pursuers as agents receiving partial observations (range and bearing) about targets which move using fixed, unknown policies. An attention mechanism is used to parameterize the value function of the agents; this mechanism allows us to handle an arbitrary number of targets. Entropy-regularized off-policy RL methods are used to train a stochastic policy, and we discuss how it enables a hedging behavior between pursuers that leads to a weak form of cooperation in spite of completely decentralized control execution. We further develop a masking heuristic that allows training on smaller problems with few pursuers-targets and execution on much larger problems. Thorough simulation experiments, ablation studies, and comparisons to state of the art algorithms are performed to study the scalability of the approach and robustness of performance to varying numbers of agents and targets.

機器人 · Performer · 估計/估計量 · 控制器 · 計算成本 ·

2021 年 11 月 10 日

Object Servoing of Differential-Drive Robots

Weibin Jia,Wenjie Zhao,Zhihuan Song,Zhengguo Li

Due to possibly changing pose of a movable object and nonholonomic constraint of a differential-drive robot, it is challenging to design an object servoing scheme for the differential-drive robot to asymptotically park at a predefined relative pose to the movable object. In this paper, a novel object servoing scheme is designed for the differential-drive robots. Each on-line relative pose is first estimated by using feature points of the moveable object and it serves as the input of an object servoing friendly parking controller. The linear velocity and angular velocity are then determined by the parking controller. Experimental results validate the performance of the proposed object servoing scheme. Due to its low on-line computational cost, the proposed scheme can be applied for last mile delivery of differential-drive robots to movable objects.

統計量 · Networking · 狀態估計 · Integration · 異常檢測 ·

2021 年 11 月 10 日

Cross-Layered Distributed Data-driven Framework For Enhanced Smart Grid Cyber-Physical Security

Allen Starke,Keerthiraj Nagaraj,Cody Ruben,Nader Aljohani,Sheng Zou,Arturo Bretas,Janise McNair,Alina Zare

Smart Grid (SG) research and development has drawn much attention from academia, industry and government due to the great impact it will have on society, economics and the environment. Securing the SG is a considerably significant challenge due the increased dependency on communication networks to assist in physical process control, exposing them to various cyber-threats. In addition to attacks that change measurement values using False Data Injection (FDI) techniques, attacks on the communication network may disrupt the power system's real-time operation by intercepting messages, or by flooding the communication channels with unnecessary data. Addressing these attacks requires a cross-layer approach. In this paper a cross-layered strategy is presented, called Cross-Layer Ensemble CorrDet with Adaptive Statistics(CECD-AS), which integrates the detection of faulty SG measurement data as well as inconsistent network inter-arrival times and transmission delays for more reliable and accurate anomaly detection and attack interpretation. Numerical results show that CECD-AS can detect multiple False Data Injections, Denial of Service (DoS) and Man In The Middle (MITM) attacks with a high F1-score compared to current approaches that only use SG measurement data for detection such as the traditional physics-based State Estimation, Ensemble CorrDet with Adaptive Statistics strategy and other machine learning classification-based detection schemes.

Color · 可約的 · 機器人 · BASIC · MoDELS ·

2021 年 11 月 9 日

Asynchronous Gathering Algorithms for Autonomous Mobile Robots with Lights

R. Nakai,Y. Sudo,K. Wada

We consider a Gathering problem for n autonomous mobile robots with persistent memory called light in an asynchronous scheduler (ASYNC). It is well known that Gathering is impossible when robots have no lights in basic common models, if the system is semi-synchronous (SSYNC) or even centralized (only one robot is active in each time). It is known that Gathering can be solved by robots with 10 colors of lights in ASYNC. This result is obtained by combining the following results. (1) The simulation of SSYNC robots with k colors by ASYNC robots with 5k colors, and (2) Gathering is solved by SSYNC robots with 2 colors. In this paper, we improve the result by reducing the number of colors and show that Gathering can be solved by ASYNC robots with 3 colors of lights. We also show that we can construct a simulation algorithm of any unfair SSYNC algorithm using k colors by ASYNC robots with 3k colors, where unfairness does not guarantee that every robot is activated infinitely often. Combining this simulation and the Gathering algorithm by SSYNC robots with 2 colors, we obtain a Gathering algorithm by ASYNC robots with 6 colors. Our main result can be obtained by reducing the number of colors from 6 to 3.

簇 · contrastive · 穩健性 · 對比學習 · INFORMS ·

2020 年 8 月 7 日

Deep Robust Clustering by Contrastive Learning

Huasong Zhong,Chong Chen,Zhongming Jin,Xian-Sheng Hua

Recently, many unsupervised deep learning methods have been proposed to learn clustering with unlabelled data. By introducing data augmentation, most of the latest methods look into deep clustering from the perspective that the original image and its tansformation should share similar semantic clustering assignment. However, the representation features before softmax activation function could be quite different even the assignment probability is very similar since softmax is only sensitive to the maximum value. This may result in high intra-class diversities in the representation feature space, which will lead to unstable local optimal and thus harm the clustering performance. By investigating the internal relationship between mutual information and contrastive learning, we summarized a general framework that can turn any maximizing mutual information into minimizing contrastive loss. We apply it to both the semantic clustering assignment and representation feature and propose a novel method named Deep Robust Clustering by Contrastive Learning (DRC). Different to existing methods, DRC aims to increase inter-class diver-sities and decrease intra-class diversities simultaneously and achieve more robust clustering results. Extensive experiments on six widely-adopted deep clustering benchmarks demonstrate the superiority of DRC in both stability and accuracy. e.g., attaining 71.6% mean accuracy on CIFAR-10, which is 7.1% higher than state-of-the-art results.

穩健性 · 深度強化學習 · 控制器 · 強化學習 · MoDELS ·

2018 年 12 月 7 日

Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control

Zhuo Xu,Chen Tang,Masayoshi Tomizuka

from arxiv, Published at IEEE ITSC 2018

Although deep reinforcement learning (deep RL) methods have lots of strengths that are favorable if applied to autonomous driving, real deep RL applications in autonomous driving have been slowed down by the modeling gap between the source (training) domain and the target (deployment) domain. Unlike current policy transfer approaches, which generally limit to the usage of uninterpretable neural network representations as the transferred features, we propose to transfer concrete kinematic quantities in autonomous driving. The proposed robust-control-based (RC) generic transfer architecture, which we call RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust tracking controller based on disturbance observer (DOB). The deep RL policies trained with known nominal dynamics model are transfered directly to the target domain, DOB-based robust tracking control is applied to tackle the modeling gap including the vehicle dynamics errors and the external disturbances such as side forces. We provide simulations validating the capability of the proposed method to achieve zero-shot transfer across multiple driving scenarios such as lane keeping, lane changing and obstacle avoidance.

Performer · 學成 · 強化學習 · 情景 · 深度強化學習 ·

2018 年 9 月 12 日

Multi-task Deep Reinforcement Learning with PopArt

Matteo Hessel,Hubert Soyer,Lasse Espeholt,Wojciech Czarnecki,Simon Schmitt,Hado van Hasselt

The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

估計/估計量 · 正交 · 泛函 · MoDELS · 有偏 ·

2018 年 1 月 20 日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Chaolu Feng

from arxiv, 27 pages, 14 figures

Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.