云南虫谷在线观看免费观看电视剧_中文字幕无码乱人伦漫画_国产三级内射自拍_日韩性爱免费视频无码_精品免费AV一区二区_91精品国产丝袜在线国语_I无码人妻久久久一区二区三区

This paper studies the class of scenario-based safety testing algorithms in the black-box safety testing configuration. For algorithms sharing the same state-action set coverage with different sampling distributions, it is commonly believed that prioritizing the exploration of high-risk state-actions leads to a better sampling efficiency. Our proposal disputes the above intuition by introducing an impossibility theorem that provably shows all safety testing algorithms of the aforementioned difference perform equally well with the same expected sampling efficiency. Moreover, for testing algorithms covering different sets of state-actions, the sampling efficiency criterion is no longer applicable as different algorithms do not necessarily converge to the same termination condition. We then propose a testing aggressiveness definition based on the almost safe set concept along with an unbiased and efficient algorithm that compares the aggressiveness between testing algorithms. Empirical observations from the safety testing of bipedal locomotion controllers and vehicle decision-making modules are also presented to support the proposed theoretical implications and methodologies.

相關內容

采樣效(xiao)率

關注 0

獎勵函數 · 泛函 · Learning · IPL · Performer ·

2023 年 5 月 24 日

Inverse Preference Learning: Preference-based RL without a Reward Function

Joey Hejna,Dorsa Sadigh

Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of preference-based RL methods na\"ively combine supervised reward models with off-the-shelf RL algorithms. Contemporary approaches have sought to improve performance and query complexity by using larger and more complex reward architectures such as transformers. Instead of using highly complex architectures, we develop a new and parameter-efficient algorithm, Inverse Preference Learning (IPL), specifically designed for learning from offline preference data. Our key insight is that for a fixed policy, the $Q$-function encodes all information about the reward function, effectively making them interchangeable. Using this insight, we completely eliminate the need for a learned reward function. Our resulting algorithm is simpler and more parameter-efficient. Across a suite of continuous control and robotics benchmarks, IPL attains competitive performance compared to more complex approaches that leverage transformer-based and non-Markovian reward functions while having fewer algorithmic hyperparameters and learned network parameters. Our code is publicly released.

Packing · 近似 · 優化器 · 示例 · 講稿 ·

2023 年 5 月 24 日

Algorithms for the Bin Packing Problem with Scenarios

Yulle G. F. Borges,Vinícius L. de Lima,Flávio K. Miyazawa,Lehilton L. C. Pedrosa,Thiago A. de Queiroz,Rafael C. S. Schouery

This paper presents theoretical and practical results for the bin packing problem with scenarios, a generalization of the classical bin packing problem which considers the presence of uncertain scenarios, of which only one is realized. For this problem, we propose an absolute approximation algorithm whose ratio is bounded by the square root of the number of scenarios times the approximation ratio for an algorithm for the vector bin packing problem. We also show how an asymptotic polynomial-time approximation scheme is derived when the number of scenarios is constant. As a practical study of the problem, we present a branch-and-price algorithm to solve an exponential model and a variable neighborhood search heuristic. To speed up the convergence of the exact algorithm, we also consider lower bounds based on dual feasible functions. Results of these algorithms show the competence of the branch-and-price in obtaining optimal solutions for about 59% of the instances considered, while the combined heuristic and branch-and-price optimally solved 62% of the instances considered.

評論員 · 泛函 · 近似 · SimPLe · 替代函數 ·

2023 年 5 月 24 日

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Sharan Vaswani,Amirreza Kazemi,Reza Babanezhad,Nicolas Le Roux

from arxiv, 44 pages

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error, an objective that is potentially decorrelated with the true goal of achieving a high reward with the actor. We address this mismatch by designing a joint objective for training the actor and critic in a decision-aware fashion. We use the proposed objective to design a generic, AC algorithm that can easily handle any function approximation. We explicitly characterize the conditions under which the resulting algorithm guarantees monotonic policy improvement, regardless of the choice of the policy and critic parameterization. Instantiating the generic algorithm results in an actor that involves maximizing a sequence of surrogate functions (similar to TRPO, PPO) and a critic that involves minimizing a closely connected objective. Using simple bandit examples, we provably establish the benefit of the proposed critic objective over the standard squared error. Finally, we empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.

貝葉斯推斷 · 推斷 · 穩健性 · 估計/估計量 · INFORMS ·

2023 年 5 月 24 日

Adversarial robustness of amortized Bayesian inference

Manuel Gl?ckler,Michael Deistler,Jakob H. Macke

Bayesian inference usually requires running potentially costly inference procedures separately for every new observation. In contrast, the idea of amortized Bayesian inference is to initially invest computational cost in training an inference network on simulated data, which can subsequently be used to rapidly perform inference (i.e., to return estimates of posterior distributions) for new observations. This approach has been applied to many real-world models in the sciences and engineering, but it is unclear how robust the approach is to adversarial perturbations in the observed data. Here, we study the adversarial robustness of amortized Bayesian inference, focusing on simulation-based estimation of multi-dimensional posterior distributions. We show that almost unrecognizable, targeted perturbations of the observations can lead to drastic changes in the predicted posterior and highly unrealistic posterior predictive samples, across several benchmark tasks and a real-world example from neuroscience. We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator, and show how it improves the adversarial robustness of amortized Bayesian inference.

廣義函數 · Learning · 極大似然估計 · 估計/估計量 · 泛函 ·

2023 年 5 月 24 日

Provable Offline Reinforcement Learning with Human Feedback

Wenhao Zhan,Masatoshi Uehara,Nathan Kallus,Jason D. Lee,Wen Sun

In this paper, we investigate the problem of offline reinforcement learning with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE. We consider the general reward setting where the reward can be defined over the whole trajectory and provide a novel guarantee that allows us to learn any target policy with a polynomial number of samples, as long as the target policy is covered by the offline data. This guarantee is the first of its kind with general function approximation. To measure the coverage of the target policy, we introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability coefficient. We also establish lower bounds that highlight the necessity of such concentrability and the difference from standard RL, where state-action-wise rewards are directly observed. We further extend and analyze our algorithm when the feedback is given over action pairs.

優化器 · Performer · motivation · 操作 · 表示 ·

2023 年 5 月 22 日

Heuristics Optimization of Boolean Circuits with application in Attribute Based Encryption

Alexandru Ionita,Denis-Andrei Banu,Iulian Oleniuc

We propose a method of optimizing monotone Boolean circuits by re-writing them in a simpler, equivalent form. We use in total six heuristics: Hill Climbing, Simulated Annealing, and variations of them, which operate on the representation of the circuit as a logical formula. Our main motivation is to improve performance in Attribute-Based Encryption (ABE) schemes for Boolean circuits. Therefore, we show how our heuristics improve ABE systems for Boolean circuits. Also, we run tests to evaluate the performance of our heuristics, both as a standalone optimization for Boolean circuits and also inside ABE systems.

樣例 · CASES · 穩健性 · 可約的 · MoDELS ·

2023 年 5 月 22 日

On The Empirical Effectiveness of Unrealistic Adversarial Hardening Against Realistic Adversarial Attacks

Salijona Dyrmishi,Salah Ghamizi,Thibault Simonetto,Yves Le Traon,Maxime Cordy

from arxiv, S&P 2023

While the literature on security attacks and defense of Machine Learning (ML) systems mostly focuses on unrealistic adversarial examples, recent research has raised concern about the under-explored field of realistic adversarial attacks and their implications on the robustness of real-world systems. Our paper paves the way for a better understanding of adversarial robustness against realistic attacks and makes two major contributions. First, we conduct a study on three real-world use cases (text classification, botnet detection, malware detection)) and five datasets in order to evaluate whether unrealistic adversarial examples can be used to protect models against realistic examples. Our results reveal discrepancies across the use cases, where unrealistic examples can either be as effective as the realistic ones or may offer only limited improvement. Second, to explain these results, we analyze the latent representation of the adversarial examples generated with realistic and unrealistic attacks. We shed light on the patterns that discriminate which unrealistic examples can be used for effective hardening. We release our code, datasets and models to support future research in exploring how to reduce the gap between unrealistic and realistic adversarial attacks.

優化器 · 近似 · MoDELS · 示例 · 分解的 ·

2023 年 5 月 20 日

On the approximability and energy-flow modeling of the electric vehicle sharing problem

Welverton R. Silva,Fábio L. Usberti,Rafael C. S. Schouery

The electric vehicle sharing problem (EVSP) arises from the planning and operation of one-way electric car-sharing systems. It aims to maximize the total rental time of a fleet of electric vehicles while ensuring that all the demands of the customer are fulfilled. In this paper, we expand the knowledge on the complexity of the EVSP by showing that it is NP-hard to approximate it to within a factor of $n^{1-\epsilon}$ in polynomial time, for any $\epsilon > 0$, where $n$ denotes the number of customers, unless P = NP. In addition, we also show that the problem does not have a monotone structure, which can be detrimental to the development of heuristics employing constructive strategies. Moreover, we propose a novel approach for the modeling of the EVSP based on energy flows in the network. Based on the new model, we propose a relax-and-fix strategy and an exact algorithm that uses a warm-start solution obtained from our heuristic approach. We report computational results comparing our formulation with the best-performing formulation in the literature. The results show that our formulation outperforms the previous one concerning the number of optimal solutions obtained, optimality gaps, and computational times. Previously, $32.7\%$ of the instances remained unsolved (within a time limit of one hour) by the best-performing formulation in the literature, while our formulation obtained optimal solutions for all instances. To stress our approaches, two more challenging new sets of instances were generated, for which we were able to solve $49.5\%$ of the instances, with an average optimality gap of $2.91\%$ for those not solved optimally.

情景 · 準則 · 優化器 · 可約的 · 講稿 ·

2023 年 5 月 19 日

Distributional Multi-Objective Decision Making

Willem R?pke,Conor F. Hayes,Patrick Mannion,Enda Howley,Ann Nowé,Diederik M. Roijers

from arxiv, Accepted at IJCAI 2023

For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.

樣例 · Extensibility · 分解的 · AI · 泛化理論 ·

2023 年 5 月 18 日

Towards Generalizable Data Protection With Transferable Unlearnable Examples

Bin Fang,Bo Li,Shuang Wu,Tianyi Zheng,Shouhong Ding,Ran Yi,Lizhuang Ma

from arxiv, arXiv admin note: text overlap with arXiv:2305.10691

Artificial Intelligence (AI) is making a profound impact in almost every domain. One of the crucial factors contributing to this success has been the access to an abundance of high-quality data for constructing machine learning models. Lately, as the role of data in artificial intelligence has been significantly magnified, concerns have arisen regarding the secure utilization of data, particularly in the context of unauthorized data usage. To mitigate data exploitation, data unlearning have been introduced to render data unexploitable. However, current unlearnable examples lack the generalization required for wide applicability. In this paper, we present a novel, generalizable data protection method by generating transferable unlearnable examples. To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution. Through extensive experimentation, we substantiate the enhanced generalizable protection capabilities of our proposed method.