99视频在线播放喷射,高清国产三级在线播放,日本又色又爽又黄一级视频,国产成人免费午夜在线观看

Expert decision-makers (DMs) in high-stakes AI-advised (AIDeT) settings receive and reconcile recommendations from AI systems before making their final decisions. We identify distinct properties of these settings which are key to developing AIDeT models that effectively benefit team performance. First, DMs in AIDeT settings exhibit algorithm discretion behavior (ADB), i.e., an idiosyncratic tendency to imperfectly accept or reject algorithmic recommendations for any given decision task. Second, DMs incur contradiction costs from exerting decision-making resources (e.g., time and effort) when reconciling AI recommendations that contradict their own judgment. Third, the human'simperfect discretion and reconciliation costs introduce the need for the AI to offer advice selectively. We refer to the task of developing AI to advise humans in AIDeT settings as learning to advise} and we address this task by first introducing the AIDeT-Learning Framework. Additionally, we argue that leveraging the human partner's ADB is key to maximizing the AIDeT's decision accuracy while regularizing for contradiction costs. Finally, we instantiate our framework to develop TeamRules (TR): an algorithm that produces rule-based models and recommendations for AIDeT settings. TR is optimized to selectively advise a human and to trade-off contradiction costs and team accuracy for a given environment by leveraging the human partner's ADB. Evaluations on synthetic and real-world benchmark datasets with a variety of simulated human accuracy and discretion behaviors show that TR robustly improves the team's objective across settings over interpretable, rule-based alternatives.

相關內容

離散化

關注 0

Continuity · Learning · 強化學習 · Extensibility · 確定性策略 ·

2022 年 12 月 7 日

RLPG: Reinforcement Learning Approach for Dynamic Intra-Platoon Gap Adaptation for Highway On-Ramp Merging

Sushma Reddy Yadavalli,Lokesh Chandra Das,Myounggyu Won

A platoon refers to a group of vehicles traveling together in very close proximity. It has received significant attention from the autonomous vehicle research community due to its strong potential to significantly enhance fuel efficiency, driving safety, and driver comfort. Despite these advantages, recent research has revealed a detrimental effect of the extremely small intra-platoon gap on traffic flow for highway on-ramp merging. While existing control-based methods allow for adaptation of the intra-platoon gap to improve traffic flow, making an optimal control decision under the complex dynamics of traffic conditions remains a significant challenge due to the massive computational complexity. To this end, we present the design, implementation, and evaluation of a novel reinforcement learning framework that adaptively adjusts the intra-platoon gap of an individual platoon member to maximize traffic flow in response to dynamically changing, complex traffic conditions for highway on-ramp merging. The state space of the framework is carefully designed in consultation with the transportation literature to incorporate critical traffic parameters relevant to merging efficiency. A deep deterministic policy gradient algorithm is adopted to account for the continuous action space to ensure precise and continuous adjustment of the intra-platoon gap. An extensive simulation study demonstrates the effectiveness of the reinforcement learning-based approach for significantly improving traffic flow in various highway merging scenarios.

Learning · 獎勵函數 · INFORMS · 泛函 · 小樣本學習 ·

2022 年 12 月 6 日

Few-Shot Preference Learning for Human-in-the-Loop RL

Joey Hejna,Dorsa Sadigh

from arxiv, 6th Annual Conference on Robot Learning (CoRL) 2022

While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$\times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results and code can be found at //sites.google.com/view/few-shot-preference-rl/home.

優化器 · 超參數 · 泛化理論 · 損失 · 近似 ·

2022 年 12 月 6 日

Two-Tailed Averaging: Anytime Adaptive Once-in-a-while Optimal Iterate Averaging for Stochastic Optimization

Gábor Melis

Tail averaging improves on Polyak averaging's non-asymptotic behaviour by excluding a number of leading iterates of stochastic optimization from its calculations. In practice, with a finite number of optimization steps and a learning rate that cannot be annealed to zero, tail averaging can get much closer to a local minimum point of the training loss than either the individual iterates or the Polyak average. However, the number of leading iterates to ignore is an important hyperparameter, and starting averaging too early or too late leads to inefficient use of resources or suboptimal solutions. Our work focusses on improving generalization, which makes setting this hyperparameter even more difficult, especially in the presence of other hyperparameters and overfitting. Furthermore, before averaging starts, the loss is only weakly informative of the final performance, which makes early stopping unreliable. To alleviate these problems, we propose an anytime variant of tail averaging intended for improving generalization not pure optimization, that has no hyperparameters and approximates the optimal tail at all optimization steps. Our algorithm is based on two running averages with adaptive lengths bounded in terms of the optimal tail length, one of which achieves approximate optimality with some regularity. Requiring only the additional storage for two sets of weights and periodic evaluation of the loss, the proposed two-tailed averaging algorithm is a practical and widely applicable method for improving generalization.

MoDELS · 過擬合 · 控制器 · Backbone · 多樣性 ·

2022 年 12 月 6 日

Pretrained Diffusion Models for Unified Human Motion Synthesis

Jianxin Ma,Shuai Bai,Chang Zhou

Generative modeling of human motion has broad applications in computer animation, virtual reality, and robotics. Conventional approaches develop separate models for different motion synthesis tasks, and typically use a model of a small size to avoid overfitting the scarce data available in each setting. It remains an open question whether developing a single unified model is feasible, which may 1) benefit the acquirement of novel skills by combining skills learned from multiple tasks, and 2) help in increasing the model capacity without overfitting by combining multiple data sources. Unification is challenging because 1) it involves diverse control signals as well as targets of varying granularity, and 2) motion datasets may use different skeletons and default poses. In this paper, we present MoFusion, a framework for unified motion synthesis. MoFusion employs a Transformer backbone to ease the inclusion of diverse control signals via cross attention, and pretrains the backbone as a diffusion model to support multi-granularity synthesis ranging from motion completion of a body part to whole-body motion generation. It uses a learnable adapter to accommodate the differences between the default skeletons used by the pretraining and the fine-tuning data. Empirical results show that pretraining is vital for scaling the model size without overfitting, and demonstrate MoFusion's potential in various tasks, e.g., text-to-motion, motion completion, and zero-shot mixing of multiple control signals. Project page: \url{//ofa-sys.github.io/MoFusion/}.

回合 · 泛化理論 · Sim2Real · INTERACT · Learning ·

2022 年 12 月 5 日

Learning Representations that Enable Generalization in Assistive Tasks

Jerry Zhi-Yang He,Aditi Raghunathan,Daniel S. Brown,Zackory Erickson,Anca D. Dragan

Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. //adaptive-caregiver.github.io.

可約的 · Agent · 回合 · 控制器 · Engineering ·

2022 年 12 月 5 日

Cooperative control of environmental extremes by artificial intelligent agents

Martí Sánchez-Fibla,Clément Moulin-Frier,Ricard Solé

from arxiv, 9 pages, 4 figures

Humans have been able to tackle biosphere complexities by acting as ecosystem engineers, profoundly changing the flows of matter, energy and information. This includes major innovations that allowed to reduce and control the impact of extreme events. Modelling the evolution of such adaptive dynamics can be challenging given the potentially large number of individual and environmental variables involved. This paper shows how to address this problem by using fire as the source of external, bursting and wide fluctuations. Fire propagates on a spatial landscape where a group of agents harvest and exploit trees while avoiding the damaging effects of fire spreading. The agents need to solve a conflict to reach a group-level optimal state: while tree harvesting reduces the propagation of fires, it also reduces the availability of resources provided by trees. It is shown that the system displays two major evolutionary innovations that end up in an ecological engineering strategy that favours high biomass along with the suppression of large fires. The implications for potential A.I. management of complex ecosystems are discussed.

數據集 · INTERACT · Machine Learning · 3D · Learning ·

2022 年 12 月 4 日

Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

Davide Berghi,Marco Volino,Philip J. B. Jackson

3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual research in this direction. In most of the existing multi-view datasets, the accompanying audio is neglected. Similarly, datasets for spatial audio research primarily offer unimodal content, and when visual data is included, the quality is far from meeting the standard production needs. We present "Tragic Talkers", an audio-visual dataset consisting of excerpts from the "Romeo and Juliet" drama captured with microphone arrays and multiple co-located cameras for light-field video. Tragic Talkers provides ideal content for object-based media (OBM) production. It is designed to cover various conventional talking scenarios, such as monologues, two-people conversations, and interactions with considerable movement and occlusion, yielding 30 sequences captured from a total of 22 different points of view and two 16-element microphone arrays. Additionally, we provide voice activity labels, 2D face bounding boxes for each camera view, 2D pose detection keypoints, 3D tracking data of the mouth of the actors, and dialogue transcriptions. We believe the community will benefit from this dataset as it can assist multidisciplinary research. Possible uses of the dataset are discussed.

估計/估計量 · CASE · 縮放 · Extensibility · 可約的 ·

2022 年 12 月 3 日

Multi-level Parareal algorithm with Averaging

Juliane Rosemeier,Terry Haut,Beth Wingate

from arxiv, 32 pages

The present study is an extension of the work done in [16] and [10], where a two-level Parareal method with averaging was examined. The method proposed in this paper is a multi-level Parareal method with arbitrarily many levels, which is not restricted to the two-level case. We give an asymptotic error estimate which reduces to the two-level estimate for the case when only two levels are considered. Introducing more than two levels has important consequences for the averaging procedure, as we choose separate averaging windows for each of the different levels, which is an additional new feature of the present study. The different averaging windows make the proposed method especially appropriate for multi-scale problems, because we can introduce a level for each intrinsic scale of the problem and adapt the averaging procedure such that we reproduce the behavior of the model on the particular scale resolved by the level.

Learning · 可辨認的 · 樣例 · 知識 (knowledge) · 情景 ·

2022 年 12 月 3 日

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

Giulia Vezzani,Dhruva Tirumala,Markus Wulfmeier,Dushyant Rao,Abbas Abdolmaleki,Ben Moran,Tuomas Haarnoja,Jan Humplik,Roland Hafner,Michael Neunert,Claudio Fantacci,Tim Hertweck,Thomas Lampe,Fereshteh Sadeghi,Nicolas Heess,Martin Riedmiller

The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert behavior can lead to poor results when given sub-optimal experts. We compare several common approaches for skill transfer on multiple domains including changes in task and system dynamics. We identify how existing methods can fail and introduce an alternative approach to mitigate these problems. Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience. This conceptual split enables rapid adaptation and thus efficient data collection but without constraining the final solution.It significantly outperforms many classical methods across a suite of evaluation tasks and we use a broad set of ablations to highlight the importance of differentc omponents of our method.

MoDELS · Machine Learning · 學成 · entity · 回合 ·

2021 年 1 月 6 日

Adaptive Synthetic Characters for Military Training

Volkan Ustun,Rajay Kumar,Adam Reilly,Seyed Sajjadi,Andrew Miller

Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.