国产乱伦对白刺激视频_亚洲国产欧美一区二区午夜浪_精品自窥自偷在线看_亚州欧美日韩国产人成在线_又粗又大又硬好爽好猛免费视频_午夜亚洲国产理论片在线播放_亚洲无码视频免费在线观看

Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal when the risk level is discretized sufficiently. However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level. In particular, we show that a saddle point property assumed to hold in prior literature may be violated. However, a decomposition does hold for Value-at-Risk and our proof demonstrates how this risk measure differs from CVaR and EVaR. Our findings are significant because risk-averse algorithms are used in high-stake environments, making their correctness much more critical.

相關內容

dynamic programming

關注 1

線性的 · 動力系統 · Tensor · Learning · MoDELS ·

2023 年 7 月 23 日

Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems

Ainesh Bakshi,Allen Liu,Ankur Moitra,Morris Yau

from arxiv, ICML 2023

Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.

賭博機/老虎機 · 統計量 · 類別 · 確切的 · 偏移量 ·

2023 年 7 月 21 日

Tight Bounds for $γ$-Regret via the Decision-Estimation Coefficient

Margalit Glasgow,Alexander Rakhlin

In this work, we give a statistical characterization of the $\gamma$-regret for arbitrary structured bandit problems, the regret which arises when comparing against a benchmark that is $\gamma$ times the optimal solution. The $\gamma$-regret emerges in structured bandit problems over a function class $\mathcal{F}$ where finding an exact optimum of $f \in \mathcal{F}$ is intractable. Our characterization is given in terms of the $\gamma$-DEC, a statistical complexity parameter for the class $\mathcal{F}$, which is a modification of the constrained Decision-Estimation Coefficient (DEC) of Foster et al., 2023 (and closely related to the original offset DEC of Foster et al., 2021). Our lower bound shows that the $\gamma$-DEC is a fundamental limit for any model class $\mathcal{F}$: for any algorithm, there exists some $f \in \mathcal{F}$ for which the $\gamma$-regret of that algorithm scales (nearly) with the $\gamma$-DEC of $\mathcal{F}$. We provide an upper bound showing that there exists an algorithm attaining a nearly matching $\gamma$-regret. Due to significant challenges in applying the prior results on the DEC to the $\gamma$-regret case, both our lower and upper bounds require novel techniques and a new algorithm.

CSP · 約束 · 可約的 · 易處理的 · motivation ·

2023 年 7 月 21 日

Local consistency as a reduction between constraint satisfaction problems

Victor Dalmau,Jakub Opr?al

We study the use of local consistency methods as reductions between constraint satisfaction problems (CSPs), and promise version thereof, with the aim to classify these reductions in a similar way as the algebraic approach classifies gadget reductions between CSPs. This research is motivated by the requirement of more expressive reductions in the scope of promise CSPs. While gadget reductions are enough to provide all necessary hardness in the scope of (finite domain) non-promise CSP, in promise CSPs a wider class of reductions needs to be used. We provide a general framework of reductions, which we call consistency reductions, that covers most (if not all) reductions recently used for proving NP-hardness of promise CSPs. We prove some basic properties of these reductions, and provide the first steps towards understanding the power of consistency reductions by characterizing a fragment associated to arc-consistency in terms of polymorphisms of the template. In addition to showing hardness, consistency reductions can also be used to provide feasible algorithms by reducing to a fixed tractable (promise) CSP, for example, to solving systems of affine equations. In this direction, among other results, we describe the well-known Sherali-Adams hierarchy for CSP in terms of a consistency reduction to linear programming.

相互獨立的 · INFORMS · 優化器 · 控制器 · Analysis ·

2023 年 7 月 21 日

An Analysis of Multi-Agent Reinforcement Learning for Decentralized Inventory Control Systems

Marwan Mousa,Damien van de Berg,Niki Kotecha,Ehecatl Antonio del Rio-Chanona,Max Mowbray

Most solutions to the inventory management problem assume a centralization of information that is incompatible with organisational constraints in real supply chain networks. The inventory management problem is a well-known planning problem in operations research, concerned with finding the optimal re-order policy for nodes in a supply chain. While many centralized solutions to the problem exist, they are not applicable to real-world supply chains made up of independent entities. The problem can however be naturally decomposed into sub-problems, each associated with an independent entity, turning it into a multi-agent system. Therefore, a decentralized data-driven solution to inventory management problems using multi-agent reinforcement learning is proposed where each entity is controlled by an agent. Three multi-agent variations of the proximal policy optimization algorithm are investigated through simulations of different supply chain networks and levels of uncertainty. The centralized training decentralized execution framework is deployed, which relies on offline centralization during simulation-based policy identification, but enables decentralization when the policies are deployed online to the real system. Results show that using multi-agent proximal policy optimization with a centralized critic leads to performance very close to that of a centralized data-driven solution and outperforms a distributed model-based solution in most cases while respecting the information constraints of the system.

置換不變性 · 估計/估計量 · 異常檢測 · 不變 · Learning ·

2023 年 7 月 20 日

High-dimensional and Permutation Invariant Anomaly Detection

Vinicius Mikuni,Benjamin Nachman

from arxiv, 7 pages, 5 figures

Methods for anomaly detection of new physics processes are often limited to low-dimensional spaces due to the difficulty of learning high-dimensional probability densities. Particularly at the constituent level, incorporating desirable properties such as permutation invariance and variable-length inputs becomes difficult within popular density estimation methods. In this work, we introduce a permutation-invariant density estimator for particle physics data based on diffusion models, specifically designed to handle variable-length inputs. We demonstrate the efficacy of our methodology by utilizing the learned density as a permutation-invariant anomaly detection score, effectively identifying jets with low likelihood under the background-only hypothesis. To validate our density estimation method, we investigate the ratio of learned densities and compare to those obtained by a supervised classification algorithm.

圖 · 情景 · 類別 · Packing · SimPLe ·

2023 年 7 月 20 日

Complexity Framework For Forbidden Subgraphs I: The Framework

Matthew Johnson,Barnaby Martin,Jelle J. Oostveen,Sukanya Pandey,Dani?l Paulusma,Siani Smith,Erik Jan van Leeuwen

For any particular class of graphs, algorithms for computational problems restricted to the class often rely on structural properties that depend on the specific problem at hand. This begs the question if a large set of such results can be explained by some common problem conditions. We propose such conditions for $HH$-subgraph-free graphs. For a set of graphs $HH$, a graph $G$ is $HH$-subgraph-free if $G$ does not contain any of graph from $H$ as a subgraph. Our conditions are easy to state. A graph problem must be efficiently solvable on graphs of bounded treewidth, computationally hard on subcubic graphs, and computational hardness must be preserved under edge subdivision of subcubic graphs. Our meta-classification says that if a graph problem satisfies all three conditions, then for every finite set $HH$, it is ``efficiently solvable'' on $HH$-subgraph-free graphs if $HH$ contains a disjoint union of one or more paths and subdivided claws, and is ``computationally hard'' otherwise. We illustrate the broad applicability of our meta-classification by obtaining a dichotomy between polynomial-time solvability and NP-completeness for many well-known partitioning, covering and packing problems, network design problems and width parameter problems. For other problems, we obtain a dichotomy between almost-linear-time solvability and having no subquadratic-time algorithm (conditioned on some hardness hypotheses). The proposed framework thus gives a simple pathway to determine the complexity of graph problems on $HH$-subgraph-free graphs. This is confirmed even more by the fact that along the way, we uncover and resolve several open questions from the literature.

道德化 · 優化器 · 可理解性 · Learning · Agent ·

2023 年 7 月 20 日

From computational ethics to morality: how decision-making algorithms can help us understand the emergence of moral principles, the existence of an optimal behaviour and our ability to discover it

Eduardo C. Garrido-Merchán,Sara Lumbreras-Sancho

This paper adds to the efforts of evolutionary ethics to naturalize morality by providing specific insights derived from a computational ethics view. We propose a stylized model of human decision-making, which is based on Reinforcement Learning, one of the most successful paradigms in Artificial Intelligence. After the main concepts related to Reinforcement Learning have been presented, some particularly useful parallels are drawn that can illuminate evolutionary accounts of ethics. Specifically, we investigate the existence of an optimal policy (or, as we will refer to, objective ethical principles) given the conditions of an agent. In addition, we will show how this policy is learnable by means of trial and error, supporting our hypotheses on two well-known theorems in the context of Reinforcement Learning. We conclude by discussing how the proposed framework can be enlarged to study other potentially interesting areas of human behavior from a formalizable perspective.

衰減系數 · SPE · Markov · 分解的 · Processing（編程語言） ·

2023 年 7 月 19 日

Markov Decision Processes with Time-Varying Geometric Discounting

Jiarui Gan,Annika Hennes,Rupak Majumdar,Debmalya Mandal,Goran Radanovic

from arxiv, 24 pages, 3 figures

Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of $\epsilon$-SPE and show that an $\epsilon$-SPE exists under milder assumptions. An algorithm is presented to compute an $\epsilon$-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.

博弈論 · 有向 · AI · 計算學習理論 ·

2021 年 1 月 21 日

Game-Theoretic and Machine Learning-based Approaches for Defensive Deception: A Survey

Mu Zhu,Ahmed H. Anwar,Zelin Wan,Jin-Hee Cho,Charles Kamhoua,Munindar P. Singh

from arxiv, 30 pages, 156 citations

Defensive deception is a promising approach for cyberdefense. Although defensive deception is increasingly popular in the research community, there has not been a systematic investigation of its key components, the underlying principles, and its tradeoffs in various problem settings. This survey paper focuses on defensive deception research centered on game theory and machine learning, since these are prominent families of artificial intelligence approaches that are widely employed in defensive deception. This paper brings forth insights, lessons, and limitations from prior work. It closes with an outline of some research directions to tackle major gaps in current defensive deception research.

圖片分類 · Machine Learning · 學成 · 樣例 · CARS ·

2020 年 9 月 8 日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Gabriel Resende Machado,Eugênio Silva,Ronaldo Ribeiro Goldschmidt

Deep Learning algorithms have achieved the state-of-the-art performance for Image Classification and have been used even in security-critical applications, such as biometric recognition systems and self-driving cars. However, recent works have shown those algorithms, which can even surpass the human capabilities, are vulnerable to adversarial examples. In Computer Vision, adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms in order to fool classifiers. As an attempt to mitigate these vulnerabilities, numerous countermeasures have been constantly proposed in literature. Nevertheless, devising an efficient defense mechanism has proven to be a difficult task, since many approaches have already shown to be ineffective to adaptive attackers. Thus, this self-containing paper aims to provide all readerships with a review of the latest research progress on Adversarial Machine Learning in Image Classification, however with a defender's perspective. Here, novel taxonomies for categorizing adversarial attacks and defenses are introduced and discussions about the existence of adversarial examples are provided. Further, in contrast to exisiting surveys, it is also given relevant guidance that should be taken into consideration by researchers when devising and evaluating defenses. Finally, based on the reviewed literature, it is discussed some promising paths for future research.