一级欧美一级日韩大片,91免费国产自产地址入口,亚洲国产中文日韩在线人高清,国产99视频精品免费观看7

Multimodal demonstrations provide robots with an abundance of information to make sense of the world. However, such abundance may not always lead to good performance when it comes to learning sensorimotor control policies from human demonstrations. Extraneous data modalities can lead to state over-specification, where the state contains modalities that are not only useless for decision-making but also can change data distribution across environments. State over-specification leads to issues such as the learned policy not generalizing outside of the training data distribution. In this work, we propose Masked Imitation Learning (MIL) to address state over-specification by selectively using informative modalities. Specifically, we design a masked policy network with a binary mask to block certain modalities. We develop a bi-level optimization algorithm that learns this mask to accurately filter over-specified modalities. We demonstrate empirically that MIL outperforms baseline algorithms in simulated domains including MuJoCo and a robot arm environment using the Robomimic dataset, and effectively recovers the environment-invariant modalities on a multimodal dataset collected on a real robot. Our project website presents supplemental details and videos of our results at: //tinyurl.com/masked-il

相關內容

Learning

關注 12

Learning · 潛在 · MoDELS · HTTPS · 分離的 ·

2022 年 10 月 26 日

Leveraging Demonstrations with Latent Space Priors

Jonas Gehring,Deepak Gopinath,Jungdam Won,Andreas Krause,Gabriel Synnaeve,Nicolas Usunier

Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and an accompanying low-level policy. The sequence model forms a latent space prior over plausible demonstration behaviors to accelerate learning of high-level policies. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning on transfer tasks. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments with a complex, simulated humanoid. Videos, source code and pre-trained models are available at the corresponding project website at //facebookresearch.github.io/latent-space-priors .

Learning · 穩健性 · 優化器 · Continuity · 多樣性 ·

2022 年 10 月 25 日

On Robust Incremental Learning over Many Multilingual Steps

Karan Praharaj,Irina Matveeva

from arxiv, Accepted for publication at the IncrLearn Workshop at the 22nd IEEE International Conference on Data Mining

Recent work in incremental learning has introduced diverse approaches to tackle catastrophic forgetting from data augmentation to optimized training regimes. However, most of them focus on very few training steps. We propose a method for robust incremental learning over dozens of fine-tuning steps using data from a variety of languages. We show that a combination of data-augmentation and an optimized training regime allows us to continue improving the model even for as many as fifty training steps. Crucially, our augmentation strategy does not require retaining access to previous training data and is suitable in scenarios with privacy constraints.

Learning · MOS · fMRI · 真實值 · 數據集 ·

2022 年 10 月 25 日

Learning Causal Discovery

Xinyue Wang,Konrad Paul Kording

from arxiv, 15 main pages, 9 figures. Will be submitted to TMLR

Causal discovery (CD) from time-varying data is important in neuroscience, medicine, and machine learning. Techniques for CD include randomized experiments which are generally unbiased but expensive. It also includes algorithms like regression, matching, and Granger causality, which are only correct under strong assumptions made by human designers. However, as we found in other areas of machine learning, humans are usually not quite right and human expertise is usually outperformed by data-driven approaches. Here we test if we can improve causal discovery in a data-driven way. We take a perturbable system with a large number of causal components (transistors), the MOS 6502 processor, acquire the causal ground truth, and learn the causal discovery procedure represented as a neural network. We find that this procedure far outperforms human-designed causal discovery procedures, such as Mutual Information, LiNGAM, and Granger Causality both on MOS 6502 processor and the NetSim dataset which simulates functional magnetic resonance imaging (fMRI) results. We argue that the causality field should consider, where possible, a supervised approach, where CD procedures are learned from large datasets with known causal relations instead of being designed by a human specialist. Our findings promise a new approach toward improving CD in neural and medical data and for the broader machine learning community.

Learning · Legged Robot · 機器人 · Microsoft Surface · tuning ·

2022 年 10 月 22 日

Sim-to-Real Learning of Compliant Bipedal Locomotion on Torque Sensor-Less Gear-Driven Humanoid

Shimpei Masuda,Kuniyuki Takahashi

from arxiv, An accompanying video is available at the following link: //drive.google.com/file/d/1F6RnzICWW_gGCNfEBW07nrFN4ZhPqjPI/view?usp=sharing

Sim-to-real is a mainstream method to cope with the large number of trials needed by typical deep reinforcement learning. However, transferring a policy trained in simulation to actual hardware remains challenging due to the reality gap. In particular, the characteristics of actuators in legged robots have a considerable influence on sim-to-real transfer. High reduction ratio gears are widely used in actuators, and the reality gap issue becomes especially pronounced when even the utilization of backdrivability is considered to control joints compliantly. We propose a new simulation model of gears to address this gap. Additionally, the difficulty in achieving stable bipedal locomotion causes typical methods to fail to tune physical parameters in simulation with the behavior of transferred policy. Thus, we propose a method for system identification that can utilize failed attempts. The method's effectiveness is verified using a biped robot, the ROBOTIS-OP3, and the sim-to-real transferred policy can stabilize the robot under severe disturbances and walk on uneven surfaces without force and torque sensors.

潛變量/隱變量 · 知識 (knowledge) · 任務對話系統 · 潛在 · 未標記 ·

2022 年 10 月 21 日

Discovering New Intents Using Latent Variables

Yunhua Zhou,Peiju Liu,Yuxin Wang,Xipeng QIu

Discovering new intents is of great significance to establishing Bootstrapped Task-Oriented Dialogue System. Most existing methods either lack the ability to transfer prior knowledge in the known intent data or fall into the dilemma of forgetting prior knowledge in the follow-up. More importantly, these methods do not deeply explore the intrinsic structure of unlabeled data, so they can not seek out the characteristics that make an intent in general. In this paper, starting from the intuition that discovering intents could be beneficial to the identification of the known intents, we propose a probabilistic framework for discovering intents where intent assignments are treated as latent variables. We adopt Expectation Maximization framework for optimization. Specifically, In E-step, we conduct discovering intents and explore the intrinsic structure of unlabeled data by the posterior of intent assignments. In M-step, we alleviate the forgetting of prior knowledge transferred from known intents by optimizing the discrimination of labeled data. Extensive experiments conducted in three challenging real-world datasets demonstrate our method can achieve substantial improvements.

控制器 · Learning · Automator · 約束 · 機器人 ·

2022 年 10 月 21 日

Differentiable Constrained Imitation Learning for Robot Motion Planning and Control

Christopher Diehl,Janis Adamek,Martin Krüger,Frank Hoffmann,Torsten Bertram

from arxiv, Under review

Motion planning and control are crucial components of robotics applications. Here, spatio-temporal hard constraints like system dynamics and safety boundaries (e.g., obstacles in automated driving) restrict the robot's motions. Direct methods from optimal control solve a constrained optimization problem. However, in many applications finding a proper cost function is inherently difficult because of the weighting of partially conflicting objectives. On the other hand, Imitation Learning (IL) methods such as Behavior Cloning (BC) provide a intuitive framework for learning decision-making from offline demonstrations and constitute a promising avenue for planning and control in complex robot applications. Prior work primarily relied on soft-constraint approaches, which use additional auxiliary loss terms describing the constraints. However, catastrophic safety-critical failures might occur in out-of-distribution (OOD) scenarios. This work integrates the flexibility of IL with hard constraint handling in optimal control. Our approach constitutes a general framework for constraint robotic motion planning and control using offline IL. Hard constraints are integrated into the learning problem in a differentiable manner, via explicit completion and gradient-based correction. Simulated experiments of mobile robot navigation and automated driving provide evidence for the performance of the proposed method.

蒙特卡羅 · Learning · 狀態空間 · 稀疏 · 經驗池 ·

2022 年 10 月 20 日

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Albert Wilcox,Ashwin Balakrishna,Jules Dedieu,Wyame Benslimane,Daniel S. Brown,Ken Goldberg

from arxiv, To be published in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). 19 pages. 11 figures

Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, prior RL from demonstrations algorithms introduce significant complexity and many hyperparameters, making them hard to implement and tune. We introduce Monte Carlo Augmented Actor Critic (MCAC), a parameter free modification to standard actor-critic algorithms which initializes the replay buffer with demonstrations and computes a modified $Q$-value by taking the maximum of the standard temporal distance (TD) target and a Monte Carlo estimate of the reward-to-go. This encourages exploration in the neighborhood of high-performing trajectories by encouraging high $Q$-values in corresponding regions of the state space. Experiments across $5$ continuous control domains suggest that MCAC can be used to significantly increase learning efficiency across $6$ commonly used RL and RL-from-demonstrations algorithms. See //sites.google.com/view/mcac-rl for code and supplementary material.

估計/估計量 · 數據集 · 可理解性 · 機器人 · 多樣性 ·

2022 年 10 月 20 日

JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking

Edward Vendrow,Duy Tho Le,Hamid Rezatofighi

from arxiv, 8 pages, 7 figures

Autonomous robotic systems operating in human environments must understand their surroundings to make accurate and safe decisions. In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking. However, existing datasets either do not provide pose annotations or include scene types unrelated to robotic applications. Many datasets also lack the diversity of poses and occlusions found in crowded human scenes. To address this limitation we introduce JRDB-Pose, a large-scale dataset and benchmark for multi-person pose estimation and tracking using videos captured from a social navigation robot. The dataset contains challenge scenes with crowded indoor and outdoor locations and a diverse range of scales and occlusion types. JRDB-Pose provides human pose annotations with per-keypoint occlusion labels and track IDs consistent across the scene. A public evaluation server is made available for fair evaluation on a held-out test set. JRDB-Pose is available at //jrdb.erc.monash.edu/ .

變換 · 學成 · Performer · MoDELS · Vision ·

2022 年 3 月 24 日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Yuting Yang,Licheng Jiao,Xu Liu,Fang Liu,Shuyuan Yang,Zhixi Feng,Xu Tang

from arxiv, arXiv admin note: text overlap with arXiv:2010.11929, arXiv:1706.03762 by other authors

Dynamic attention mechanism and global modeling ability make Transformer show strong feature learning ability. In recent years, Transformer has become comparable to CNNs methods in computer vision. This review mainly investigates the current research progress of Transformer in image and video applications, which makes a comprehensive overview of Transformer in visual learning understanding. First, the attention mechanism is reviewed, which plays an essential part in Transformer. And then, the visual Transformer model and the principle of each module are introduced. Thirdly, the existing Transformer-based models are investigated, and their performance is compared in visual learning understanding applications. Three image tasks and two video tasks of computer vision are investigated. The former mainly includes image classification, object detection, and image segmentation. The latter contains object tracking and video classification. It is significant for comparing different models' performance in various tasks on several public benchmark data sets. Finally, ten general problems are summarized, and the developing prospects of the visual Transformer are given in this review.

事件抽取 · 學成 · 逆強化學習 · GAN · 估計/估計量 ·

2018 年 4 月 21 日

Event Extraction with Generative Adversarial Imitation Learning

Tongtao Zhang,Heng Ji

We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.