一区二区三区四区五区无码,中文字幕AV一区二区三区亭亭色,操污黄网站在线观看

Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.

相關內容

優化器

關注 4

對數幾率 · 規范化的 · Learning · 穩健性 · Networking ·

2023 年 2 月 15 日

Uncertainty-Estimation with Normalized Logits for Out-of-Distribution Detection

Mouxiao Huang,Yu Qiao

from arxiv, 7 pages, 1 figure, 7 tables, preprint

Out-of-distribution (OOD) detection is critical for preventing deep learning models from making incorrect predictions to ensure the safety of artificial intelligence systems. Especially in safety-critical applications such as medical diagnosis and autonomous driving, the cost of incorrect decisions is usually unbearable. However, neural networks often suffer from the overconfidence issue, making high confidence for OOD data which are never seen during training process and may be irrelevant to training data, namely in-distribution (ID) data. Determining the reliability of the prediction is still a difficult and challenging task. In this work, we propose Uncertainty-Estimation with Normalized Logits (UE-NL), a robust learning method for OOD detection, which has three main benefits. (1) Neural networks with UE-NL treat every ID sample equally by predicting the uncertainty score of input data and the uncertainty is added into softmax function to adjust the learning strength of easy and hard samples during training phase, making the model learn robustly and accurately. (2) UE-NL enforces a constant vector norm on the logits to decouple the effect of the increasing output norm from optimization process, which causes the overconfidence issue to some extent. (3) UE-NL provides a new metric, the magnitude of uncertainty score, to detect OOD data. Experiments demonstrate that UE-NL achieves top performance on common OOD benchmarks and is more robust to noisy ID data that may be misjudged as OOD data by other methods.

優化器 · 回合 · 控制器 · Learning · 樣本 ·

2023 年 2 月 15 日

CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies

Simo Alami. C,Fernando Llorente,Rim Kaddah,Luca Martino,Jesse Read

from arxiv, 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022, pp. 1482-1486

Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solving a given problem (task or environment) involves converging towards an optimal policy. However, there might exist multiple optimal policies that can dramatically differ in their behaviour; for example, some may be faster than the others but at the expense of greater risk. We consider and study a distribution of optimal policies. We design a curiosity-augmented Metropolis algorithm (CAMEO), such that we can sample optimal policies, and such that these policies effectively adopt diverse behaviours, since this implies greater coverage of the different possible optimal policies. In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems, and even in the challenging case of environments that provide sparse rewards. We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability, and represents a first step towards learning the distribution of optimal policies itself.

估計/估計量 · 變換 · 位置嵌入 · MoDELS · 3D ·

2023 年 2 月 15 日

Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

Han Li,Bowen Shi,Wenrui Dai,Hongwei Zheng,Botao Wang,Yu Sun,Min Guo,Chenlin Li,Junni Zou,Hongkai Xiong

from arxiv, accepted by AAAI2023

There has been a recent surge of interest in introducing transformers to 3D human pose estimation (HPE) due to their powerful capabilities in modeling long-term dependencies. However, existing transformer-based methods treat body joints as equally important inputs and ignore the prior knowledge of human skeleton topology in the self-attention mechanism. To tackle this issue, in this paper, we propose a Pose-Oriented Transformer (POT) with uncertainty guided refinement for 3D HPE. Specifically, we first develop novel pose-oriented self-attention mechanism and distance-related position embedding for POT to explicitly exploit the human skeleton topology. The pose-oriented self-attention mechanism explicitly models the topological interactions between body joints, whereas the distance-related position embedding encodes the distance of joints to the root joint to distinguish groups of joints with different difficulties in regression. Furthermore, we present an Uncertainty-Guided Refinement Network (UGRN) to refine pose predictions from POT, especially for the difficult joints, by considering the estimated uncertainty of each joint with uncertainty-guided sampling strategy and self-attention mechanism. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art methods with reduced model parameters on 3D HPE benchmarks such as Human3.6M and MPI-INF-3DHP

估計/估計量 · 規范化的 · 可約的 · 設計 · MoDELS ·

2023 年 2 月 14 日

Adaptive design of experiment via normalizing flows for failure probability estimation

Hongji Wang,Tiexin Guo,Jinglai Li,Hongqiao Wang

from arxiv, failure probability, normalizing flows, adaptive design of experiment. arXiv admin note: text overlap with arXiv:1509.04613

Failure probability estimation problem is an crucial task in engineering. In this work we consider this problem in the situation that the underlying computer models are extremely expensive, which often arises in the practice, and in this setting, reducing the calls of computer model is of essential importance. We formulate the problem of estimating the failure probability with expensive computer models as an sequential experimental design for the limit state (i.e., the failure boundary) and propose a series of efficient adaptive design criteria to solve the design of experiment (DOE). In particular, the proposed method employs the deep neural network (DNN) as the surrogate of limit state function for efficiently reducing the calls of expensive computer experiment. A map from the Gaussian distribution to the posterior approximation of the limit state is learned by the normalizing flows for the ease of experimental design. Three normalizing-flows-based design criteria are proposed in this work for deciding the design locations based on the different assumption of generalization error. The accuracy and performance of the proposed method is demonstrated by both theory and practical examples.

優化器 · Learning · MoDELS · 控制器 · 經驗風險 ·

2023 年 2 月 13 日

One-shot Learning of Surrogates in PDE-constrained Optimization Under Uncertainty

Philipp A. Guth,Claudia Schillings,Simon Weissmann

We propose a general framework for machine learning based optimization under uncertainty. Our approach replaces the complex forward model by a surrogate, e.g., a neural network, which is learned simultaneously in a one-shot sense when solving the optimal control problem. Our approach relies on a reformulation of the problem as a penalized empirical risk minimization problem for which we provide a consistency analysis in terms of large data and increasing penalty parameter. To solve the resulting problem, we suggest a stochastic gradient method with adaptive control of the penalty parameter and prove convergence under suitable assumptions on the surrogate model. Numerical experiments illustrate the results for linear and nonlinear surrogate models.

邊緣化 · 潛在 · 潛變量/隱變量 · 穩健性 · Continuity ·

2023 年 2 月 10 日

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Dinghuai Zhang,Aaron Courville,Yoshua Bengio,Qinqing Zheng,Amy Zhang,Ricky T. Q. Chen

from arxiv, Accepted by ICLR 2023

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity. In this work, we propose the adoption of latent variable policies within the MaxEnt framework, which we show can provably approximate any policy distribution, and additionally, naturally emerges under the use of world models with a latent belief state. We discuss why latent variable policies are difficult to train, how naive approaches can fail, then subsequently introduce a series of improvements centered around low-cost marginalization of the latent state, allowing us to make full use of the latent state at minimal additional cost. We instantiate our method under the actor-critic framework, marginalizing both the actor and critic. The resulting algorithm, referred to as Stochastic Marginal Actor-Critic (SMAC), is simple yet effective. We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training. Our implementation is open sourced at //github.com/zdhNarsil/Stochastic-Marginal-Actor-Critic.

回合 · Learning · Performer · Agent · 情景 ·

2023 年 2 月 10 日

Look around and learn: self-improving object detection by exploration

Gianluca Scarpellini,Stefano Rosa,Pietro Morerio,Lorenzo Natale,Alessio Del Bue

Object detectors often experience a drop in performance when new environmental conditions are insufficiently represented in the training data. This paper studies how to automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., in an utterly self-supervised fashion. In our setting, an agent initially learns to explore the environment using a pre-trained off-the-shelf detector to locate objects and associate pseudo-labels. By assuming that pseudo-labels for the same object must be consistent across different views, we learn an exploration policy mining hard samples and we devise a novel mechanism for producing refined predictions from the consensus among observations. Our approach outperforms the current state-of-the-art, and it closes the performance gap against a fully supervised setting without relying on ground-truth annotations. We also compare various exploration policies for the agent to gather more informative observations. Code and dataset will be made available upon paper acceptance

估計/估計量 · 數據拆分 · Performer · MoDELS · 方差 ·

2023 年 2 月 10 日

The out-of-sample $R^2$: estimation and inference

Stijn Hawinkel,Willem Waegeman,Steven Maere

Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $\hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $\hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $\text{Brassica napus}$ and $\text{Zea mays}$ phenotypes based on gene expression data.

有偏 · 估計/估計量 · Learning · 優化器 · 學習率 ·

2023 年 2 月 10 日

Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning

Haoxuan Pan,Deheng Ye,Xiaoming Duan,Qiang Fu,Wei Yang,Jianping He,Mingfei Sun

from arxiv, 12 pages, 9 figures

We revisit the estimation bias in policy gradients for the discounted episodic Markov decision process (MDP) from Deep Reinforcement Learning (DRL) perspective. The objective is formulated theoretically as the expected returns discounted over the time horizon. One of the major policy gradient biases is the state distribution shift: the state distribution used to estimate the gradients differs from the theoretical formulation in that it does not take into account the discount factor. Existing discussion of the influence of this bias was limited to the tabular and softmax cases in the literature. Therefore, in this paper, we extend it to the DRL setting where the policy is parameterized and demonstrate how this bias can lead to suboptimal policies theoretically. We then discuss why the empirically inaccurate implementations with shifted state distribution can still be effective. We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization. Specifically, we show that a smaller learning rate, or, an adaptive learning rate, such as that used by Adam and RSMProp optimizers, makes the policy optimization robust to the bias. We further draw connections between optimizers and the optimization regularization to show that both the KL and the reverse KL regularization can significantly rectify this bias. Moreover, we provide extensive experiments on continuous control tasks to support our analysis. Our paper sheds light on how successful PG algorithms optimize policies in the DRL setting, and contributes insights into the practical issues in DRL.

核密度估計 · 穩健性 · 估計/估計量 · 變換 · 核化 ·

2023 年 2 月 9 日

Designing Robust Transformers using Robust Kernel Density Estimation

Xing Han,Tongzheng Ren,Tan Minh Nguyen,Khai Nguyen,Joydeep Ghosh,Nhat Ho

from arxiv, 21 pages, 2 figures, 8 tables

Recent advances in Transformer architectures have empowered their empirical success in a variety of tasks across different domains. However, existing works mainly focus on predictive accuracy and computational cost, without considering other practical issues, such as robustness to contaminated samples. Recent work by Nguyen et al., (2022) has shown that the self-attention mechanism, which is the center of the Transformer architecture, can be viewed as a non-parametric estimator based on kernel density estimation (KDE). This motivates us to leverage a set of robust kernel density estimation methods for alleviating the issue of data contamination. Specifically, we introduce a series of self-attention mechanisms that can be incorporated into different Transformer architectures and discuss the special properties of each method. We then perform extensive empirical studies on language modeling and image classification tasks. Our methods demonstrate robust performance in multiple scenarios while maintaining competitive results on clean datasets.