五月丁香四月婷婷激情综合_国产三级精品一区在线观看_欧美精品日韩一区二区三区_高清视频VA欧美日本免费_日韩精品成人免费观看_天天日天天干天天操天天日_欧美性爱C级视频

from arxiv, This paper has been accepted to IEEE Transactions on Services Computing. Revisions to the previous version include expanded material on smart contracts in Section 9

The paper develops a logical understanding of processes for signature of legal contracts, motivated by applications to legal recognition of smart contracts on blockchain platforms. A number of axioms and rules of inference are developed that can be used to justify a ``meeting of the minds'' precondition for contract formation from the fact that certain content has been signed. In addition to an ``offer and acceptance'' process, the paper considers ``signature in counterparts'', a legal process that permits a contract between two or more parties to be brought into force by having the parties independently (possibly, remotely) sign different copies of the contract, rather than placing their signatures on a common copy at a physical meeting. It is argued that a satisfactory account of signature in counterparts benefits from a logic with syntactic self-reference. The axioms used are supported by a formal semantics, and a number of further properties of the logic are investigated. In particular, it is shown that the logic implies that when a contract has been signed, the parties do not just agree, but are in mutual agreement (a common-knowledge-like notion) about the terms of the contract.

相關內容

收(shou)縮(suo)

關注 0

TD · 標注 · Chromium · INFORMS · 估計/估計量 ·

2022 年 1 月 28 日

Detecting Discussions of Technical Debt

Ipek Ozkaya,Zachary Kurtz,Robert L. Nord,Raghvinder S. Sangwan,Satish M. Srinivasan

from arxiv, 12 pages, 5 figures, 5 tables

Technical debt (TD) refers to suboptimal choices during software development that achieve short-term goals at the expense of long-term quality. Although developers often informally discuss TD, the concept has not yet crystalized into a consistently applied label when describing issues in most repositories. We apply machine learning to understand developer insights into TD when discussing tickets in an issue tracker. We generate expert labels that indicate whether discussion of TD occurs in the free text associated with each ticket in a sample of more than 1,900 tickets in the Chromium issue tracker. We then use these labels to train a classifier that estimates labels for the remaining 475,000 tickets. We conclude that discussion of TD appears in about 16% of the tracked Chromium issues. If we can effectively classify TD-related issues, we can focus on what practices could be most useful for their timely resolution.

策略改進 · Performer · 離散化 · Processing（編程語言） · 可辨認的 ·

2022 年 1 月 28 日

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

Philipp Scholl,Felix Dietrich,Clemens Otte,Steffen Udluft

from arxiv, 12 pages, International Conference on Agents and Artificial Intelligence 2022

Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additionally, we provide a heuristic algorithm that exhibits the best performance among many state of the art SPI algorithms on two different benchmarks. Furthermore, we introduce a taxonomy of SPI algorithms and empirically show an interesting property of two classes of SPI algorithms: while the mean performance of algorithms that incorporate the uncertainty as a penalty on the action-value is higher, actively restricting the set of policies more consistently produces good policies and is, thus, safer.

控制器 · 學成 · 有向 · Processing（編程語言） · 強化學習 ·

2022 年 1 月 27 日

Closed-Loop Control of Direct Ink Writing via Reinforcement Learning

Michal Piovarci,Michael Foshey,Jie Xu,Timothy Erps,Vahid Babaei,Piotr Didyk,Szymon Rusinkiewicz,Wojciech Matusik,Bernd Bickel

Enabling additive manufacturing to employ a wide range of novel, functional materials can be a major boost to this technology. However, making such materials printable requires painstaking trial-and-error by an expert operator, as they typically tend to exhibit peculiar rheological or hysteresis properties. Even in the case of successfully finding the process parameters, there is no guarantee of print-to-print consistency due to material differences between batches. These challenges make closed-loop feedback an attractive option where the process parameters are adjusted on-the-fly. There are several challenges for designing an efficient controller: the deposition parameters are complex and highly coupled, artifacts occur after long time horizons, simulating the deposition is computationally costly, and learning on hardware is intractable. In this work, we demonstrate the feasibility of learning a closed-loop control policy for additive manufacturing using reinforcement learning. We show that approximate, but efficient, numerical simulation is sufficient as long as it allows learning the behavioral patterns of deposition that translate to real-world experiences. In combination with reinforcement learning, our model can be used to discover control policies that outperform baseline controllers. Furthermore, the recovered policies have a minimal sim-to-real gap. We showcase this by applying our control policy in-vivo on a single-layer, direct ink writing printer.

控制器 · 優化器 · 穩健性 · 周期的 · MoDELS ·

2022 年 1 月 27 日

Optimal control of Hopf bifurcations

Nicolas Boullé,Patrick E. Farrell,Marie E. Rognes

from arxiv, 22 pages, 8 figures

We introduce a numerical technique for controlling the location and stability properties of Hopf bifurcations in dynamical systems. The algorithm consists of solving an optimization problem constrained by an extended system of nonlinear partial differential equations that characterizes Hopf bifurcation points. The flexibility and robustness of the method allows us to advance or delay a Hopf bifurcation to a target value of the bifurcation parameter, as well as controlling the oscillation frequency with respect to a parameter of the system or the shape of the domain on which solutions are defined. Numerical applications are presented in systems arising from biology and fluid dynamics, such as the FitzHugh-Nagumo model, Ginzburg-Landau equation, Rayleigh-B\'enard convection problem, and Navier-Stokes equations, where the control of the location and oscillation frequency of periodic solutions is of high interest.

學成 · 通道 · Performer · 獎勵函數 · MoDELS ·

2022 年 1 月 27 日

Multi-Agent Adversarial Attacks for Multi-Channel Communications

Juncheng Dong,Suya Wu,Mohammadreza Sultani,Vahid Tarokh

Recently Reinforcement Learning (RL) has been applied as an anti-adversarial remedy in wireless communication networks. However, studying the RL-based approaches from the adversary's perspective has received little attention. Additionally, RL-based approaches in an anti-adversary or adversarial paradigm mostly consider single-channel communication (either channel selection or single channel power control), while multi-channel communication is more common in practice. In this paper, we propose a multi-agent adversary system (MAAS) for modeling and analyzing adversaries in a wireless communication scenario by careful design of the reward function under realistic communication scenarios. In particular, by modeling the adversaries as learning agents, we show that the proposed MAAS is able to successfully choose the transmitted channel(s) and their respective allocated power(s) without any prior knowledge of the sender strategy. Compared to the single-agent adversary (SAA), multi-agents in MAAS can achieve significant reduction in signal-to-noise ratio (SINR) under the same power constraints and partial observability, while providing improved stability and a more efficient learning process. Moreover, through empirical studies we show that the results in simulation are close to the ones in communication in reality, a conclusion that is pivotal to the validity of performance of agents evaluated in simulations.

Performer · CASES · 回合 · CASE · 學成 ·

2022 年 1 月 27 日

Time Limits in Reinforcement Learning

Fabio Pardo,Arash Tavakoli,Vitaly Levdik,Petar Kormushev

from arxiv, ICML 2018, NIPS 2017 Deep RL Symposium, code and videos: //sites.google.com/view/time-limits-in-rl

In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and explain why not doing so can cause state aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. In case (i), we argue that the terminations due to time limits are in fact part of the environment, and thus a notion of the remaining time should be included as part of the agent's input to avoid violation of the Markov property. In case (ii), the time limits are not part of the environment and are only used to facilitate learning. We argue that this insight should be incorporated by bootstrapping from the value of the state at the end of each partial episode. For both cases, we illustrate empirically the significance of our considerations in improving the performance and stability of existing reinforcement learning algorithms, showing state-of-the-art results on several control tasks.

INFORMS · contrastive · INTERACT · 泛化理論 · XAI ·

2022 年 1 月 27 日

Diagnosing AI Explanation Methods with Folk Concepts of Behavior

Alon Jacovi,Jasmijn Bastings,Sebastian Gehrmann,Yoav Goldberg,Katja Filippova

When explaining AI behavior to humans, how is the communicated information being comprehended by the human explainee, and does it match what the explanation attempted to communicate? When can we say that an explanation is explaining something? We aim to provide an answer by leveraging theory of mind literature about the folk concepts that humans use to understand behavior. We establish a framework of social attribution by the human explainee, which describes the function of explanations: the concrete information that humans comprehend from them. Specifically, effective explanations should be coherent (communicate information which generalizes to other contrast cases), complete (communicating an explicit contrast case, objective causes, and subjective causes), and interactive (surfacing and resolving contradictions to the generalization property through iterations). We demonstrate that many XAI mechanisms can be mapped to folk concepts of behavior. This allows us to uncover their modes of failure that prevent current methods from explaining effectively, and what is necessary to enable coherent explanations.

contrastive · 對比學習 · 學成 · Better · 支持向量 ·

2021 年 12 月 21 日

Max-Margin Contrastive Learning

Anshul Shah,Suvrit Sra,Rama Chellappa,Anoop Cherian

from arxiv, Accepted at AAAI 2022

Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence. We suspect this behavior is due to the suboptimal selection of negatives used for offering contrast to the positives. We counter this difficulty by taking inspiration from support vector machines (SVMs) to present max-margin contrastive learning (MMCL). Our approach selects negatives as the sparse support vectors obtained via a quadratic optimization problem, and contrastiveness is enforced by maximizing the decision margin. As SVM optimization can be computationally demanding, especially in an end-to-end setting, we present simplifications that alleviate the computational burden. We validate our approach on standard vision benchmark datasets, demonstrating better performance in unsupervised representation learning over state-of-the-art, while having better empirical convergence properties.

學成 · 約束 · 強化學習 · contrastive · 評論員 ·

2021 年 5 月 21 日

Inverse Constrained Reinforcement Learning

Usman Anwar,Shehryar Malik,Alireza Aghasi,Ali Ahmed

from arxiv, Camera-ready version for ICML 2021

In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimentally validate our approach and show that our framework can successfully learn the most likely constraints that the agent respects. We further show that these learned constraints are \textit{transferable} to new agents that may have different morphologies and/or reward functions. Previous works in this regard have either mainly been restricted to tabular (discrete) settings, specific types of constraints or assume the environment's transition dynamics. In contrast, our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a completely model-free setting. The code can be found it: \url{//github.com/shehryar-malik/icrl}.

獎勵函數 · 線性的 · 強化學習 · 學成 · 值迭代 ·

2018 年 12 月 6 日

Logically-Constrained Reinforcement Learning

Mohammadhosein Hasanbeig,Alessandro Abate,Daniel Kroening

This paper proposes a model-free Reinforcement Learning (RL) algorithm to synthesise policies for an unknown Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), then construct a synchronized MDP between the automaton and the original MDP. According to the resulting LDBA, a reward function is then defined over the state-action pairs of the product MDP. With this reward function, our algorithm synthesises a policy whose traces satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.