蜜芽亚洲精品国产品国语在线试看,尹人香蕉网在线视频观看,久久九九有精品国产12

There is a growing demand for redirected walking (RDW) techniques and their application. To apply appropriate RDW methods and manipulation, the RDW controllers are predominantly used. There are three types of RDW controllers: direct scripted controller, generalized controller, and predictive controller. The scripted controller type pre-scripts the mapping between the real and virtual environments. The generalized controller type employs the RDW method and manipulation quantities according to a certain procedure depending on the user's position in relation to the real space. This approach has the potential to be reused in any environment; however, it is not fully optimized. The predictive controller type predicts the user's future path using the user's behavior and manages RDW techniques. This approach is highly anticipated to be very effective and versatile; however, it has not been sufficiently developed. This paper proposes a novel RDW controller using reinforcement learning (RL) with advanced plannability/versatility. Our simulation experiments indicate that the proposed method can reduce the number of reset manipulations, which is one of the indicators of the effectiveness of the RDW controller, compared to the generalized controller under real environments with many obstacles. Meanwhile, the experimental results also showed that the gain output by the proposed method oscillates. The results of a user study conducted showed that the proposed RDW controller can reduce the number of resets compared to the conventional generalized controller. Furthermore, no adverse effects such as cybersickness associated with the oscillation of the output gain were evinced. The simulation and user studies demonstrate that the proposed RDW controller with RL outperforms the existing generalized controllers and can be applied to users.

相關內容

控制器

關注 5

穩健性 · 控制器 · 強化學習 · 學成 · 回合 ·

2021 年 12 月 3 日

Improving the Robustness of Reinforcement Learning Policies with $\mathcal{L}_{1}$ Adaptive Control

Y. Cheng,P. Zhao,D. J. Block,N. Hovakimyan

from arxiv, arXiv admin note: substantial text overlap with arXiv:2106.02249

A reinforcement learning (RL) control policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RLpolicy by augmenting it with an $\mathcal{L}_{1}$ adaptive controller ($ \mathcal{L}_{1}$AC). Leveraging the capability of an $\mathcal{L}_{1}$AC for fast estimation and active compensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and model-based methods. A video for the experiments on a real Pendubot setup is availableat//youtu.be/xgOB9vpyUgE.

深度強化學習 · 強化學習 · 學成 · 生成方法 · 過擬合 ·

2021 年 11 月 18 日

A Survey of Generalisation in Deep Reinforcement Learning

Robert Kirk,Amy Zhang,Edward Grefenstette,Tim Rockt?schel

The study of generalisation in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We provide a unifying formalism and terminology for discussing different generalisation problems, building upon previous works. We go on to categorise existing benchmarks for generalisation, as well as current methods for tackling the generalisation problem. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation, and we recommend building benchmarks in underexplored problem settings such as offline RL generalisation and reward-function variation.

INFORMS · 學成 · 可辨認的 · INTERACT · Guidance ·

2021 年 9 月 15 日

Exploration in Deep Reinforcement Learning: A Comprehensive Survey

Tianpei Yang,Hongyao Tang,Chenjia Bai,Jinyi Liu,Jianye Hao,Zhaopeng Meng,Peng Liu

from arxiv, Repolishment is made, revise some incorrect descriptions

Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, such as game AI, autonomous vehicles, robotics and finance. However, DRL and deep MARL agents are widely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings, thus preventing the wide application in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how to efficiently explore the unknown environments and collect informative experiences that could benefit the policy learning most. In this paper, we conduct a comprehensive survey on existing exploration methods in DRL and deep MARL for the purpose of providing understandings and insights on the critical problems and solutions. We first identify several key challenges to achieve efficient exploration, which most of the exploration methods aim at addressing. Then we provide a systematic survey of existing approaches by classifying them into two major categories: uncertainty-oriented exploration and intrinsic motivation-oriented exploration. The essence of uncertainty-oriented exploration is to leverage the quantification of the epistemic and aleatoric uncertainty to derive efficient exploration. By contrast, intrinsic motivation-oriented exploration methods usually incorporate different reward agnostic information for intrinsic exploration guidance. Beyond the above two main branches, we also conclude other exploration methods which adopt sophisticated techniques but are difficult to be classified into the above two categories. In addition, we provide a comprehensive empirical comparison of exploration methods for DRL on a set of commonly used benchmarks. Finally, we summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.

控制器 · INTERACT · state-of-the-art · 模型評估 · Next ·

2020 年 8 月 3 日

Controllable Multi-Interest Framework for Recommendation

Yukuo Cen,Jianwei Zhang,Xu Zou,Chang Zhou,Hongxia Yang,Jie Tang

from arxiv, Accepted to KDD 2020

Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding cannot reflect the user's multiple interests during a period. In this paper, we propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity. We conduct experiments for the sequential recommendation on two real-world datasets, Amazon and Taobao. Experimental results demonstrate that our framework achieves significant improvements over state-of-the-art models. Our framework has also been successfully deployed on the offline Alibaba distributed cloud platform.

CARS · 稀疏自編碼 · 控制器 · 劃分 · 路徑 ·

2019 年 1 月 14 日

Self-Driving Cars: A Survey

Claudine Badue,Ranik Guidolini,Raphael Vivacqua Carneiro,Pedro Azevedo,Vinicius Brito Cardoso,Avelino Forechi,Luan Ferreira Reis Jesus,Rodrigo Ferreira Berriel,Thiago Meireles Paix?o,Filipe Mutz,Thiago Oliveira-Santos,Alberto Ferreira De Souza

We survey research on self-driving cars published in the literature focusing on autonomous cars developed since the DARPA challenges, which are equipped with an autonomy system that can be categorized as SAE level 3 or higher. The architecture of the autonomy system of self-driving cars is typically organized into the perception system and the decision-making system. The perception system is generally divided into many subsystems responsible for tasks such as self-driving-car localization, static obstacles mapping, moving obstacles detection and tracking, road mapping, traffic signalization detection and recognition, among others. The decision-making system is commonly partitioned as well into many subsystems responsible for tasks such as route planning, path planning, behavior selection, motion planning, and control. In this survey, we present the typical architecture of the autonomy system of self-driving cars. We also review research on relevant methods for perception and decision making. Furthermore, we present a detailed description of the architecture of the autonomy system of the UFES's car, IARA. Finally, we list prominent autonomous research cars developed by technology companies and reported in the media.

Extensibility · 學成 · 控制器 · 強化學習 · 確定性策略 ·

2018 年 7 月 10 日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Xiaodan Liang,Tairui Wang,Luona Yang,Eric Xing

from arxiv, To appear in ECCV 2018

Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the pre-processing perception system while the supervised learning-based models are limited by the accessibility of extensive human experience. We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover, we propose to specialize adaptive policies and steering-angle reward designs for different control signals (i.e. follow, straight, turn right, turn left) based on the shared representations to improve the model capability in tackling with diverse cases. Extensive experiments on CARLA driving benchmark demonstrate that CIRL substantially outperforms all previous methods in terms of the percentage of successfully completed episodes on a variety of goal-directed driving tasks. We also show its superior generalization capability in unseen environments. To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.

深度強化學習 · 強化學習 · 學成 · 回合 · 優化器 ·

2018 年 6 月 27 日

A Multi-Objective Deep Reinforcement Learning Framework

Thanh Thi Nguyen

from arxiv, 17 pages

This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose the use of linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on two benchmark problems including the two-objective deep sea treasure environment and the three-objective mountain car problem indicate that the proposed framework is able to converge to the optimal Pareto solutions effectively. The proposed framework is generic, which allows implementation of different deep reinforcement learning algorithms in different complex environments. This therefore overcomes many difficulties involved with standard multi-objective reinforcement learning (MORL) methods existing in the current literature. The framework creates a platform as a testbed environment to develop methods for solving various problems associated with the current MORL. Details of the framework implementation can be referred to //www.deakin.edu.au/~thanhthi/drl.htm.

學成 · 均值 · 強化學習 · entity · INTERACT ·

2018 年 6 月 12 日

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang,Rui Luo,Minne Li,Ming Zhou,Weinan Zhang,Jun Wang

from arxiv, ICML 2018 (Full paper + Long talk)

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

學成 · 圖像分割 · 控制器 · Performer · MoDELS ·

2018 年 4 月 29 日

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Zhang-Wei Hong,Chen Yu-Ming,Shih-Yang Su,Tzu-Yun Shann,Yi-Hsiang Chang,Hsuan-Kung Yang,Brian Hsi-Lin Ho,Chih-Chieh Tu,Yueh-Chuan Chang,Tsu-Ching Hsiao,Hsin-Wei Hsiao,Sih-Pin Lai,Chun-Yi Lee

from arxiv, 7 pages, accepted by IJCAI-18

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

2018 年 1 月 5 日

Deep Reinforcement Learning for List-wise Recommendations

Xiangyu Zhao,Liang Zhang,Zhuoye Ding,Dawei Yin,Yihong Zhao,Jiliang Tang

Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.