国产欧美日韩视频一区二区_中文字幕一区二区三区你懂的_亚洲A级天堂无码在线观看_欧美A级一区二区三区中文观看_韩国18禁免费无码漫画_午夜亚洲中文字幕一区二区三_美女裸体网站

Adaptive control is subject to stability and performance issues when a learned model is used to enhance its performance. This paper thus presents a deep learning-based adaptive control framework for nonlinear systems with multiplicatively-separable parametrization, called adaptive Neural Contraction Metric (aNCM). The aNCM approximates real-time optimization for computing a differential Lyapunov function and a corresponding stabilizing adaptive control law by using a Deep Neural Network (DNN). The use of DNNs permits real-time implementation of the control law and broad applicability to a variety of nonlinear systems with parametric and nonparametric uncertainties. We show using contraction theory that the aNCM ensures exponential boundedness of the distance between the target and controlled trajectories in the presence of parametric uncertainties of the model, learning errors caused by aNCM approximation, and external disturbances. Its superiority to the existing robust and adaptive control methods is demonstrated using a cart-pole balancing model.

相關內容

控制器

關注 5

可約的 · 控制器 · SCAN · Guidance · CASES ·

2021 年 11 月 26 日

Testability-Aware Low Power Controller Design with Evolutionary Learning

Min Li,Zhengyuan Shi,Zezhong Wang,Weiwei Zhang,Yu Huang,Qiang Xu

from arxiv, Accepted by ITC 2021 (short paper). This is the long paper version. Code is available on //github.com/lee-man/ga-testing

XORNet-based low power controller is a popular technique to reduce circuit transitions in scan-based testing. However, existing solutions construct the XORNet evenly for scan chain control, and it may result in sub-optimal solutions without any design guidance. In this paper, we propose a novel testability-aware low power controller with evolutionary learning. The XORNet generated from the proposed genetic algorithm (GA) enables adaptive control for scan chains according to their usages, thereby significantly improving XORNet encoding capacity, reducing the number of failure cases with ATPG and decreasing test data volume. Experimental results indicate that under the same control bits, our GA-guided XORNet design can improve the fault coverage by up to 2.11%. The proposed GA-guided XORNets also allows reducing the number of control bits, and the total testing time decreases by 20.78% on average and up to 47.09% compared to the existing design without sacrificing test coverage.

變分自編碼 · contrastive · 自編碼器 · MoDELS · Performer ·

2021 年 3 月 19 日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Zhe Xie,Chengxuan Liu,Yichi Zhang,Hongtao Lu,Dong Wang,Yue Ding

from arxiv, 11 pages, WWW 2021

Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in sequential recommendation. Recently, the generative models based on Variational Autoencoder (VAE) have shown the unique advantage in collaborative filtering. In particular, the sequential VAE model as a recurrent version of VAE can effectively capture temporal dependencies among items in user sequence and perform sequential recommendation. However, VAE-based models suffer from a common limitation that the representational ability of the obtained approximate posterior distribution is limited, resulting in lower quality of generated samples. This is especially true for generating sequences. To solve the above problem, in this work, we propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation. Specifically, we first introduce the adversarial training for sequence generation under the Adversarial Variational Bayes (AVB) framework, which enables our model to generate high-quality latent variables. Then, we employ the contrastive loss. The latent variables will be able to learn more personalized and salient characteristics by minimizing the contrastive loss. Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence. Finally, we conduct extensive experiments on four real-world datasets. The experimental results show that our proposed ACVAE model outperforms other state-of-the-art methods.

學成 · 表示學習 · contrastive · 強化學習 · Performer ·

2021 年 2 月 22 日

Return-Based Contrastive Representation Learning for Reinforcement Learning

Guoqing Liu,Chuheng Zhang,Li Zhao,Tao Qin,Jinhua Zhu,Jian Li,Nenghai Yu,Tie-Yan Liu

from arxiv, ICLR 2021

Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL). However, existing auxiliary tasks do not take the characteristics of RL problems into consideration and are unsupervised. By leveraging returns, the most important feedback signals in RL, we propose a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns. Our auxiliary loss is theoretically justified to learn representations that capture the structure of a new form of state-action abstraction, under which state-action pairs with similar return distributions are aggregated together. In low data regime, our algorithm outperforms strong baselines on complex tasks in Atari games and DeepMind Control suite, and achieves even better performance when combined with existing auxiliary tasks.

元學習 · 有向 · 學成 · 數據集 · 推斷 ·

2020 年 7 月 6 日

Meta Learning for Causal Direction

Jean-Francois Ton,Dino Sejdinovic,Kenji Fukumizu

The inaccessibility of controlled randomized trials due to inherent constraints in many fields of science has been a fundamental issue in causal inference. In this paper, we focus on distinguishing the cause from effect in the bivariate setting under limited observational data. Based on recent developments in meta learning as well as in causal inference, we introduce a novel generative model that allows distinguishing cause and effect in the small data setting. Using a learnt task variable that contains distributional information of each dataset, we propose an end-to-end algorithm that makes use of similar training datasets at test time. We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.

樣例 · 學成 · Performer · tuning · Weight ·

2019 年 5 月 5 日

Learning to Reweight Examples for Robust Deep Learning

Mengye Ren,Wenyuan Zeng,Bin Yang,Raquel Urtasun

from arxiv, 13 pages; Published at ICML 2018; Code released at: //github.com/uber-research/learning-to-reweight-examples

Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.

tuning · 學成 · 深度強化學習 · 超參數 · Performer ·

2018 年 12 月 26 日

Learning to Walk via Deep Reinforcement Learning

Tuomas Haarnoja,Aurick Zhou,Sehoon Ha,Jie Tan,George Tucker,Sergey Levine

from arxiv, Videos: //sites.google.com/view/minitaur-locomotion/ . arXiv admin note: substantial text overlap with arXiv:1812.05905

Deep reinforcement learning suggests the promise of fully automated learning of robotic control policies that directly map sensory inputs to low-level actions. However, applying deep reinforcement learning methods on real-world robots is exceptionally difficult, due both to the sample complexity and, just as importantly, the sensitivity of such methods to hyperparameters. While hyperparameter tuning can be performed in parallel in simulated domains, it is usually impractical to tune hyperparameters directly on real-world robotic platforms, especially legged platforms like quadrupedal robots that can be damaged through extensive trial-and-error learning. In this paper, we develop a stable variant of the soft actor-critic deep reinforcement learning algorithm that requires minimal hyperparameter tuning, while also requiring only a modest number of trials to learn multilayer neural network policies. This algorithm is based on the framework of maximum entropy reinforcement learning, and automatically trades off exploration against exploitation by dynamically and automatically tuning a temperature parameter that determines the stochasticity of the policy. We show that this method achieves state-of-the-art performance on four standard benchmark environments. We then demonstrate that it can be used to learn quadrupedal locomotion gaits on a real-world Minitaur robot, learning to walk from scratch directly in the real world in two hours of training.

異方差 · 上置信界限 · 深度 Q 學習 · 強化學習 · 學成 ·

2018 年 12 月 18 日

Information-Directed Exploration for Deep Reinforcement Learning

Nikolay Nikolov,Johannes Kirschner,Felix Berkenkamp,Andreas Krause

Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.

穩健性 · 深度強化學習 · 控制器 · 強化學習 · MoDELS ·

2018 年 12 月 7 日

Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control

Zhuo Xu,Chen Tang,Masayoshi Tomizuka

from arxiv, Published at IEEE ITSC 2018

Although deep reinforcement learning (deep RL) methods have lots of strengths that are favorable if applied to autonomous driving, real deep RL applications in autonomous driving have been slowed down by the modeling gap between the source (training) domain and the target (deployment) domain. Unlike current policy transfer approaches, which generally limit to the usage of uninterpretable neural network representations as the transferred features, we propose to transfer concrete kinematic quantities in autonomous driving. The proposed robust-control-based (RC) generic transfer architecture, which we call RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust tracking controller based on disturbance observer (DOB). The deep RL policies trained with known nominal dynamics model are transfered directly to the target domain, DOB-based robust tracking control is applied to tackle the modeling gap including the vehicle dynamics errors and the external disturbances such as side forces. We provide simulations validating the capability of the proposed method to achieve zero-shot transfer across multiple driving scenarios such as lane keeping, lane changing and obstacle avoidance.

學成 · 控制器 · MoDELS · 在線 · 元學習 ·

2018 年 3 月 30 日

Learning to Adapt: Meta-Learning for Model-Based Control

Ignasi Clavera,Anusha Nagabandi,Ronald S. Fearing,Pieter Abbeel,Sergey Levine,Chelsea Finn

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations can cause proficient but narrowly-learned policies to fail at test time. In this work, we propose to learn how to quickly and effectively adapt online to new situations as well as to perturbations. To enable sample-efficient meta-learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach trains a global model such that, when combined with recent data, the model can be be rapidly adapted to the local context. Our experiments demonstrate that our approach can enable simulated agents to adapt their behavior online to novel terrains, to a crippled leg, and in highly-dynamic environments.

學成 · 泛函 · 優化器 · 控制器 · MoDELS ·

2018 年 1 月 29 日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Motoya Ohnishi,Li Wang,Gennaro Notomista,Magnus Egerstedt

from arxiv, 14 pages, 10 figures, submitted to IEEE Transactions on Robotics

This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback controllers only when safety is about to be violated. Under some mild assumptions, solutions to the constrained feedback-controller optimization are guaranteed to be globally optimal, and the monotonic improvement of a feedback controller is thus ensured. In addition, we reformulate the (action-)value function approximation to make any kernel-based nonlinear function estimation method applicable. We then employ a state-of-the-art kernel adaptive filtering technique for the (action-)value function approximation. The resulting framework is verified experimentally on a brushbot, whose dynamics is unknown and highly complex.