2020久久精品亚洲热综合-日韩A精品日韩精品无码

The Teacher-Student Framework (TSF) is a reinforcement learning setting where a teacher agent guards the training of a student agent by intervening and providing online demonstrations. Assuming optimal, the teacher policy has the perfect timing and capability to intervene in the learning process of the student agent, providing safety guarantee and exploration guidance. Nevertheless, in many real-world settings it is expensive or even impossible to obtain a well-performing teacher policy. In this work, we relax the assumption of a well-performing teacher and develop a new method that can incorporate arbitrary teacher policies with modest or inferior performance. We instantiate an Off-Policy Reinforcement Learning algorithm, termed Teacher-Student Shared Control (TS2C), which incorporates teacher intervention based on trajectory-based value estimation. Theoretical analysis validates that the proposed TS2C algorithm attains efficient exploration and substantial safety guarantee without being affected by the teacher's own performance. Experiments on various continuous control tasks show that our method can exploit teacher policies at different performance levels while maintaining a low training cost. Moreover, the student policy surpasses the imperfect teacher policy in terms of higher accumulated reward in held-out testing environments. Code is available at //metadriverse.github.io/TS2C.

相關內容

Agent

關注 15

潛在 · MoDELS · Learning · Performer · 生成模型 ·

2023 年 4 月 25 日

Latent Traversals in Generative Models as Potential Flows

Yue Song,Andy Keller,Nicu Sebe,Max Welling

from arxiv, ICML 2023

Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in `disentangled' generations. In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent. Experimentally, we demonstrate that our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines. Further, we demonstrate that our method can be integrated as a regularization term during training, thereby acting as an inductive bias towards the learning of structured representations, ultimately improving model likelihood on similarly structured data.

Learning · Machine Learning · 統計量 · 數據集 · 有偏 ·

2023 年 4 月 25 日

A Simplicity Bubble Problem in Formal-Theoretic Learning Systems

Felipe S. Abrah?o,Hector Zenil,Fabio Porto,Michael Winter,Klaus Wehmuth,Itala M. L. D'Ottaviano

When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that current approaches to machine learning (including deep learning, or any formal-theoretic hybrid mix of top-down AI and statistical machine learning approaches), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every learning algorithm (with or without access to a formal theory), there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets can deceive learning algorithms into a ``simplicity bubble'' as likely as any other particular non-deceiving dataset. These deceiving datasets guarantee that any prediction effected by the learning algorithm will unpredictably diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution, although the latter is deemed a global one by the learning algorithm. We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, and motivated by, the intrinsic power of algorithmic information theory and computability theory.

求逆 · 控制器 · Learning · 可辨認的 · 穩健性 ·

2023 年 4 月 25 日

Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers

Bryan Habas,Jack W. Langelaan,Bo Cheng

from arxiv, 8 pages, 6 Figures, Submitted for ICRA 2023 Conference (Pending Review)

Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for reliable execution of this challenging aerobatic maneuver in small aerial robots. In this work, we first utilized Deep Reinforcement Learning and a physics-based simulation to obtain a general, optimal control policy for robust inverted landing starting from any arbitrary approach condition. This optimized control policy provides a computationally-efficient mapping from the system's observational space to its motor command action space, including both triggering and control of rotational maneuvers. This was done by training the system over a large range of approach flight velocities that varied with magnitude and direction. Next, we performed a sim-to-real transfer and experimental validation of the learned policy via domain randomization, by varying the robot's inertial parameters in the simulation. Through experimental trials, we identified several dominant factors which greatly improved landing robustness and the primary mechanisms that determined inverted landing success. We expect the learning framework developed in this study can be generalized to solve more challenging tasks, such as utilizing noisy onboard sensory data, landing on surfaces of various orientations, or landing on dynamically-moving surfaces.

Performer · MoDELS · Learning · 迭代模型參數 · Extensibility ·

2023 年 4 月 25 日

Test-Time Adaptation with Perturbation Consistency Learning

Yi Su,Yixin Ji,Juntao Li,Hai Ye,Min Zhang

Currently, pre-trained language models (PLMs) do not cope well with the distribution shift problem, resulting in models trained on the training set failing in real test scenarios. To address this problem, the test-time adaptation (TTA) shows great potential, which updates model parameters to suit the test data at the testing time. Existing TTA methods rely on well-designed auxiliary tasks or self-training strategies based on pseudo-label. However, these methods do not achieve good trade-offs regarding performance gains and computational costs. To obtain some insights into such a dilemma, we take two representative TTA methods, i.e., Tent and OIL, for exploration and find that stable prediction is the key to achieving a good balance. Accordingly, in this paper, we propose perturbation consistency learning (PCL), a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts. Extensive experiments on adversarial robustness and cross-lingual transferring demonstrate that our method can achieve higher or comparable performance with less inference time over strong PLM backbones and previous state-of-the-art TTA methods.

Learning · 穩健性 · 泛化理論 · Better · 樣本 ·

2023 年 4 月 25 日

Combining Adversaries with Anti-adversaries in Training

Xiaoling Zhou,Nan Yang,Ou Wu

from arxiv, 8 pages, 5 figures

Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.

Continuity · 分解的 · Better · 潛在 · MoDELS ·

2023 年 4 月 25 日

Continuous Input Embedding Size Search For Recommender Systems

Yunke Qu,Tong Chen,Xiangyu Zhao,Lizhen Cui,Kai Zheng,Hongzhi Yin

from arxiv, To appear in SIGIR'23

Latent factor models are the most popular backbones for today's recommender systems owing to their prominent performance. Latent factor models represent users and items as real-valued embedding vectors for pairwise similarity computation, and all embeddings are traditionally restricted to a uniform size that is relatively large (e.g., 256-dimensional). With the exponentially expanding user base and item catalog in contemporary e-commerce, this design is admittedly becoming memory-inefficient. To facilitate lightweight recommendation, reinforcement learning (RL) has recently opened up opportunities for identifying varying embedding sizes for different users/items. However, challenged by search efficiency and learning an optimal RL policy, existing RL-based methods are restricted to highly discrete, predefined embedding size choices. This leads to a largely overlooked potential of introducing finer granularity into embedding sizes to obtain better recommendation effectiveness under a given memory budget. In this paper, we propose continuous input embedding size search (CIESS), a novel RL-based method that operates on a continuous search space with arbitrary embedding sizes to choose from. In CIESS, we further present an innovative random walk-based exploration strategy to allow the RL policy to efficiently explore more candidate embedding sizes and converge to a better decision. CIESS is also model-agnostic and hence generalizable to a variety of latent factor RSs, whilst experiments on two real-world datasets have shown state-of-the-art performance of CIESS under different memory budgets when paired with three popular recommendation models.

控制器 · Performer · 優化器 · Learning · 線性的 ·

2023 年 4 月 24 日

Two-step reinforcement learning for model-free redesign of nonlinear optimal regulator

Mei Minami,Yuka Masumoto,Yoshihiro Okawa,Tomotake Sasaki,Yutaka Hori

In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesign of optimal controllers for nonlinear dynamical systems based only on the measurement of the closed-loop system. However, the learning process of RL usually requires a considerable number of trial-and-error experiments using the poorly controlled system that may accumulate wear on the plant. To overcome this limitation, we propose a model-free two-step design approach that improves the transient learning performance of RL in an optimal regulator redesign problem for unknown nonlinear systems. Specifically, we first design a linear control law that attains some degree of control performance in a model-free manner, and then, train the nonlinear optimal control law with online RL by using the designed linear control law in parallel. We introduce an offline RL algorithm for the design of the linear control law and theoretically guarantee its convergence to the LQR controller under mild assumptions. Numerical simulations show that the proposed approach improves the transient learning performance and efficiency in hyperparameter tuning of RL.

多樣性 · 峰值 · MoDELS · Weight · 輸出 ·

2023 年 4 月 24 日

Towards Mode Balancing of Generative Models via Diversity Weights

Sebastian Berns,Simon Colton,Christian Guckelsberger

from arxiv, Accepted for oral presentation and publication in the proceedings of the International Conference on Computational Creativity (ICCC) 2023

Large data-driven image models are extensively used to support creative and artistic work. Under the currently predominant distribution-fitting paradigm, a dataset is treated as ground truth to be approximated as closely as possible. Yet, many creative applications demand a diverse range of output, and creators often strive to actively diverge from a given data distribution. We argue that an adjustment of modelling objectives, from pure mode coverage towards mode balancing, is necessary to accommodate the goal of higher output diversity. We present diversity weights, a training scheme that increases a model's output diversity by balancing the modes in the training dataset. First experiments in a controlled setting demonstrate the potential of our method. We conclude by contextualising our contribution to diversity within the wider debate on bias, fairness and representation in generative machine learning.

稀疏 · 模型評估 · MoDELS · 泛函 · 可約的 ·

2023 年 4 月 24 日

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Shaoyi Huang,Bowen Lei,Dongkuan Xu,Hongwu Peng,Yue Sun,Mimi Xie,Caiwen Ding

Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98\% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy improvement compared to SOTA sparse training methods.

確切的 · 控制器 · Analysis · MoDELS · contrastive ·

2023 年 4 月 23 日

Exact Worst-Case Execution-Time Analysis for Implicit Model Predictive Control

Daniel Arnstr?m,David Broman,Daniel Axehill

We propose the first method that determines the exact worst-case execution time (WCET) for implicit linear model predictive control (MPC). Such WCET bounds are imperative when MPC is used in real time to control safety-critical systems. The proposed method applies when the quadratic programming solver in the MPC controller belongs to a family of well-established active-set solvers. For such solvers, we leverage a previously proposed complexity certification framework to generate a finite set of archetypal optimization problems; we prove that these archetypal problems form an execution-time equivalent cover of all possible problems; that is, that they capture the execution time for solving any possible optimization problem that can be encountered online. Hence, by solving just these archetypal problems on the hardware on which the MPC is to be deployed, and by recording the execution times, we obtain the exact WCET. In addition to providing formal proofs of the methods efficacy, we validate the method on an MPC example where an inverted pendulum on a cart is stabilized. The experiments highlight the following advantages compared with classical WCET methods: (i) in contrast to classical static methods, our method gives the exact WCET; (ii) in contrast to classical measurement-based methods, our method guarantees a correct WCET estimate and requires fewer measurements on the hardware.