无码人妻一区二区三区在线不卡_91超碰人妻偷情在线播放_大大操黄色网站免费_不卡无在线一区二区三区_99久久精品无码一区二区_国产精品日本欧美一区二区_精品国产AV无码专区亚洲精品

Robots are more capable of achieving manipulation tasks for everyday activities than before. But the safety of manipulation skills that robots employ is still an open problem. Considering all possible failures during skill learning increases the complexity of the process and restrains learning an optimal policy. Beyond that, in unstructured environments, it is not easy to enumerate all possible failures beforehand. In the context of safe skill manipulation, we reformulate skills as base and failure prevention skills where base skills aim at completing tasks and failure prevention skills focus on reducing the risk of failures to occur. Then, we propose a modular and hierarchical method for safe robot manipulation by augmenting base skills by learning failure prevention skills with reinforcement learning, forming a skill library to address different safety risks. Furthermore, a skill selection policy that considers estimated risks is used for the robot to select the best control policy for safe manipulation. Our experiments show that the proposed method achieves the given goal while ensuring safety by preventing failures. We also show that with the proposed method, skill learning is feasible, novel failures are easily adaptable, and our safe manipulation tools can be transferred to the real environment.

相關內容

Learning

關注 12

Learning · Facebook AI Research · 強化學習 · 泛函 · 獎勵函數 ·

2023 年 6 月 16 日

Fairness in Preference-based Reinforcement Learning

Umer Siddique,Abhinav Sinha,Yongcan Cao

In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in the presence of multiple objectives. The main objective is to design control policies that can optimize multiple objectives while treating each objective fairly. Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences rather than reward-based preference in PbRL, coupled with policy learning via maximizing a generalized Gini welfare function. Finally, we provide experiment studies on three different environments to show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.

BCH · 解碼 · 可約的 · Analysis · 可交換的 ·

2023 年 6 月 16 日

A Practical Early-Stopped Technique and Analysis for BCH Decoding Algorithm

Hong-fu Chou

from arxiv, 6 pages, 5 figures

In this paper, a practical technique for the conventional Berlekamp-Massey(BM) algorithm is provided to reduce the latency of decoding and save decoding power by early termination or early-stopped checking. We investigate the consecutive zero discrepancies during the decoding iteration and make a decision to early stop the decoding process. This technique is subject to decoding failure in exchange for the decoding latency. We analyze our proposed technique by considering the weight distribution of BCH code and estimating the bounds of undetected error probability as the event of enormous stop checking. The proposed method is effective in numerical results and the probability of decoding failure is lower than $10^{-119}$ for decoding 16383 code length of BCH codes. Furthermore, the complexity compared the conventional early termination method with the proposed approach for decoding the long BCH code. The proposed approach reduces the complexity of the conventional approach by up to 80\%. As a result, the FPGA testing on a USB device validates the reliability of the proposed method.

Agent · 分解的 · Performance · AI · INTERACT ·

2023 年 6 月 15 日

Making an agent's trust stable in a series of success and failure tasks through empathy

Takahiro Tsumura,Seiji Yamada

from arxiv, 9 pages, 7 figures, 4tables, submitted HAI2023

As AI technology develops, trust in AI agents is becoming more important for more AI applications in human society. Possible ways to improve the trust relationship include empathy, success-failure series, and capability (performance). Appropriate trust is less likely to cause deviations between actual and ideal performance. In this study, we focus on the agent's empathy and success-failure series to increase trust in AI agents. We experimentally examine the effect of empathy from agent to person on changes in trust over time. The experiment was conducted with a two-factor mixed design: empathy (available, not available) and success-failure series (phase 1 to phase 5). An analysis of variance (ANOVA) was conducted using data from 198 participants. The results showed an interaction between the empathy factor and the success-failure series factor, with trust in the agent stabilizing when empathy was present. This result supports our hypothesis. This study shows that designing AI agents to be empathetic is an important factor for trust and helps humans build appropriate trust relationships with AI agents.

Learning · 控制器 · Performer · 代價 · 機器人 ·

2023 年 6 月 15 日

A Framework for Learning from Demonstration with Minimal Human Effort

Marc Rigter,Bruno Lacerda,Nick Hawes

from arxiv, Preprint version of IEEE Robotics and Automation Letters paper

We consider robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control. In this setting we address reinforcement learning, and learning from demonstration, where there is a cost associated with human time. This cost represents the human time required to teleoperate the robot, or recover the robot from failures. For each episode, the agent must choose between requesting human teleoperation, or using one of its autonomous controllers. In our approach, we learn to predict the success probability for each controller, given the initial state of an episode. This is used in a contextual multi-armed bandit algorithm to choose the controller for the episode. A controller is learnt online from demonstrations and reinforcement learning so that autonomous performance improves, and the system becomes less reliant on the teleoperator with more experience. We show that our approach to controller selection reduces the human cost to perform two simulated tasks and a single real-world task.

估計/估計量 · SimPLe · 值域 · 回合 · 點云 ·

2023 年 6 月 14 日

Generalizable One-shot Rope Manipulation with Parameter-Aware Policy

So Kuroki,Jiaxian Guo,Tatsuya Matsushima,Takuya Okubo,Masato Kobayashi,Yuya Ikeda,Ryosuke Takanami,Paul Yoo,Yutaka Matsuo,Yusuke Iwasawa

Due to the inherent uncertainty in their deformability during motion, previous methods in rope manipulation often require hundreds of real-world demonstrations to train a manipulation policy for each rope, even for simple tasks such as rope goal reaching, which hinder their applications in our ever-changing world. To address this issue, we introduce GenORM, a framework that allows the manipulation policy to handle different deformable ropes with a single real-world demonstration. To achieve this, we augment the policy by conditioning it on deformable rope parameters and training it with a diverse range of simulated deformable ropes so that the policy can adjust actions based on different rope parameters. At the time of inference, given a new rope, GenORM estimates the deformable rope parameters by minimizing the disparity between the grid density of point clouds of real-world demonstrations and simulations. With the help of a differentiable physics simulator, we require only a single real-world demonstration. Empirical validations on both simulated and real-world rope manipulation setups clearly show that our method can manipulate different ropes with a single demonstration and significantly outperforms the baseline in both environments (62% improvement in in-domain ropes, and 15% improvement in out-of-distribution ropes in simulation, 26% improvement in real-world), demonstrating the effectiveness of our approach in one-shot rope manipulation.

Learning · 穩健性 · 強化學習 · 分離的 · 有向 ·

2022 年 9 月 16 日

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Mengdi Xu,Zuxin Liu,Peide Huang,Wenhao Ding,Zhepeng Cen,Bo Li,Ding Zhao

from arxiv, 36 pages, 5 figures

A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.

ContrastMask · contrastive · 學成 · 對比學習 · 掩碼 ·

2022 年 3 月 18 日

ContrastMask: Contrastive Learning to Segment Every Thing

Xuehui Wang,Kai Zhao,Ruixin Zhang,Shouhong Ding,Yan Wang,Wei Shen

from arxiv, Accepted to CVPR 2022

Partially-supervised instance segmentation is a task which requests segmenting objects from novel unseen categories via learning on limited seen categories with annotated masks thus eliminating demands of heavy annotation burden. The key to addressing this task is to build an effective class-agnostic mask segmentation model. Unlike previous methods that learn such models only on seen categories, in this paper, we propose a new method, named ContrastMask, which learns a mask segmentation model on both seen and unseen categories under a unified pixel-level contrastive learning framework. In this framework, annotated masks of seen categories and pseudo masks of unseen categories serve as a prior for contrastive learning, where features from the mask regions (foreground) are pulled together, and are contrasted against those from the background, and vice versa. Through this framework, feature discrimination between foreground and background is largely improved, facilitating learning of the class-agnostic mask segmentation model. Exhaustive experiments on the COCO dataset demonstrate the superiority of our method, which outperforms previous state-of-the-arts.

可理解性 · ASSETS · 可辨認的 · BEGAN · INFORMS ·

2021 年 11 月 13 日

Understanding and Assessment of Mission-Centric Key Cyber Terrains for joint Military Operations

álvaro Luis Martínez,Jorge Maestre Vidal,Victor A. Villagrá González

from arxiv, Preprint of an extended version of the conference "A novel automatic discovery system of critical assets in cyberspace-oriented military missions", in Proc. of the First Workshop on Recent Advances in Cyber Situational Awareness on Military Operations (CSA 2020) held by the 15th ARES International Conference in August 2020. //doi.org/10.1145/3407023.3409225

Since the cyberspace consolidated as fifth warfare dimension, the different actors of the defense sector began an arms race toward achieving cyber superiority, on which research, academic and industrial stakeholders contribute from a dual vision, mostly linked to a large and heterogeneous heritage of developments and adoption of civilian cybersecurity capabilities. In this context, augmenting the conscious of the context and warfare environment, risks and impacts of cyber threats on kinetic actuations became a critical rule-changer that military decision-makers are considering. A major challenge on acquiring mission-centric Cyber Situational Awareness (CSA) is the dynamic inference and assessment of the vertical propagations from situations that occurred at the mission supportive Information and Communications Technologies (ICT), up to their relevance at military tactical, operational and strategical views. In order to contribute on acquiring CSA, this paper addresses a major gap in the cyber defence state-of-the-art: the dynamic identification of Key Cyber Terrains (KCT) on a mission-centric context. Accordingly, the proposed KCT identification approach explores the dependency degrees among tasks and assets defined by commanders as part of the assessment criteria. These are correlated with the discoveries on the operational network and the asset vulnerabilities identified thorough the supported mission development. The proposal is presented as a reference model that reveals key aspects for mission-centric KCT analysis and supports its enforcement and further enforcement by including an illustrative application case.

Extensibility · 學成 · 噪聲分布 · Networking · 表征學習 ·

2021 年 7 月 25 日

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Xinru Chen,Chengbo Dong,Jiaqi Ji,Juan Cao,Xirong Li

from arxiv, Accepted by ICCV 2021

The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity overlooked. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifact surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by current semantic segmentation network based methods. Our thoughts are realized by a new network which we term MVSS-Net. Extensive experiments on five benchmark sets justify the viability of MVSS-Net for both pixel-level and image-level manipulation detection.

欠估計 · 過估計 · DQN · 估計/估計量 · 有偏 ·

2020 年 12 月 2 日

Self-correcting Q-Learning

Rong Zhu,Mattia Rigotti

from arxiv, Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.