精品夜色国产国偷自产乱码_亚洲AV永久少妇精品一区在线_亚洲AV无码国产精品午色_欧美熟妇另类久久久久精品_久久久亚洲日本韩国一区二区_国产美女福利极品一二区_国产在线拍偷自拍无码

Oracle networks feeding off-chain information to a blockchain are required to solve a distributed agreement problem since these networks receive information from multiple sources and at different times. We make a key observation that in most cases, the value obtained by oracle network nodes from multiple information sources are in close proximity. We define a notion of agreement distance and leverage the availability of a state machine replication (SMR) service to solve this distributed agreement problem with an honest simple majority of nodes instead of the conventional requirement of an honest super majority of nodes. Values from multiple nodes being in close proximity, therefore, forming a coherent cluster, is one of the keys to its efficiency. Our asynchronous protocol also embeds a fallback mechanism if the coherent cluster formation fails. Through simulations using real-world exchange data from seven prominent exchanges, we show that even for very small agreement distance values, the protocol would be able to form coherent clusters and therefore, can safely tolerate up to $1/2$ fraction of Byzantine nodes. We also show that, for a small statistical error, it is possible to choose the size of the oracle network to be significantly smaller than the entire system tolerating up to a $1/3$ fraction of Byzantine failures. This allows the oracle network to operate much more efficiently and horizontally scale much better.

相關內容

Oracle

關注 2

甲骨文(wen)(wen)公(gong)(gong)司(si)，全(quan)稱甲骨文(wen)(wen)股份有限(xian)公(gong)(gong)司(si)(甲骨文(wen)(wen)軟件系統有限(xian)公(gong)(gong)司(si))，是全(quan)球最(zui)大(da)的(de)企業級軟件公(gong)(gong)司(si)，總部(bu)位于(yu)美國加利(li)福尼(ni)亞州(zhou)的(de)紅木灘(tan)。1989年(nian)正式進入(ru)中國市場(chang)。2013年(nian)，甲骨文(wen)(wen)已超越 IBM ，成為繼 Microsoft 后(hou)全(quan)球第二大(da)軟件公(gong)(gong)司(si)。

估計/估計量 · 方差 · 評論員 · Learning · 方差減小 ·

2023 年 6 月 22 日

Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction

Taher Jafferjee,Juliusz Ziomek,Tianpei Yang,Zipeng Dai,Jianhong Wang,Matthew Taylor,Kun Shao,Jun Wang,David Mguni

Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly represent the actual joint-policy of the system of agents leading to high variance gradient estimates that hinder learning. To address this problem, we propose an enhancement tool that accommodates any actor-critic MARL method. Our framework, Performance Enhancing Reinforcement Learning Apparatus (PERLA), introduces a sampling technique of the agents' joint-policy into the critics while the agents train. This leads to TD updates that closely approximate the true expected value under the current joint-policy rather than estimates from a single sample of the joint-action at a given state. This produces low variance and precise estimates of expected returns, minimising the variance in the critic estimators which typically hinders learning. Moreover, as we demonstrate, by eliminating much of the critic variance from the single sampling of the joint policy, PERLA enables CT-DE methods to scale more efficiently with the number of agents. Theoretically, we prove that PERLA reduces variance in value estimates similar to that of decentralised training while maintaining the benefits of centralised training. Empirically, we demonstrate PERLA's superior performance and ability to reduce estimator variance in a range of benchmarks including Multi-agent Mujoco, and StarCraft II Multi-agent Challenge.

穩健性 · 動量 · Learning · MoDELS · 聯邦學習 ·

2023 年 6 月 22 日

DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client Momentum

Xiaolan Gu,Ming Li,Li Xiong

from arxiv, arXiv admin note: text overlap with arXiv:2112.12727 by other authors

Federated Learning (FL) allows multiple participating clients to train machine learning models collaboratively by keeping their datasets local and only exchanging the gradient or model updates with a coordinating server. Existing FL protocols were shown to be vulnerable to attacks that aim to compromise data privacy and/or model robustness. Recently proposed defenses focused on ensuring either privacy or robustness, but not both. In this paper, we focus on simultaneously achieving differential privacy (DP) and Byzantine robustness for cross-silo FL, based on the idea of learning from history. The robustness is achieved via client momentum, which averages the updates of each client over time, thus reduces the variance of the honest clients and exposes the small malicious perturbations of Byzantine clients that are undetectable in a single round but accumulate over time. In our initial solution DP-BREM, the DP property is achieved via adding noise to the aggregated momentum, and we account for the privacy cost from the momentum, which is different from the conventional DP-SGD that accounts for the privacy cost from gradient. Since DP-BREM assumes a trusted server (who can obtain clients' local models or updates), we further develop the final solution called DP-BREM+, which achieves the same DP and robustness properties as DP-BREM without a trusted server by utilizing secure aggregation techniques, where DP noise is securely and jointly generated by the clients. Our theoretical analysis on the convergence rate and experimental results under different DP guarantees and attack settings demonstrate that our proposed protocols achieve better privacy-utility tradeoff and stronger Byzantine robustness than several baseline methods.

噪聲 · 知識 (knowledge) · 標注 · Learning · 情景 ·

2023 年 6 月 20 日

LNL+K: Learning with Noisy Labels and Noise Source Distribution Knowledge

Siqi Wang,Bryan A. Plummer

Learning with noisy labels (LNL) is challenging as the model tends to memorize noisy labels, which can lead to overfitting. Many LNL methods detect clean samples by maximizing the similarity between samples in each category, which does not make any assumptions about likely noise sources. However, we often have some knowledge about the potential source(s) of noisy labels. For example, an image mislabeled as a cheetah is more likely a leopard than a hippopotamus due to their visual similarity. Thus, we introduce a new task called Learning with Noisy Labels and noise source distribution Knowledge (LNL+K), which assumes we have some knowledge about likely source(s) of label noise that we can take advantage of. By making this presumption, methods are better equipped to distinguish hard negatives between categories from label noise. In addition, this enables us to explore datasets where the noise may represent the majority of samples, a setting that breaks a critical premise of most methods developed for the LNL task. We explore several baseline LNL+K approaches that integrate noise source knowledge into state-of-the-art LNL methods across three diverse datasets and three types of noise, where we report a 5-15% boost in performance compared with the unadapted methods. Critically, we find that LNL methods do not generalize well in every setting, highlighting the importance of directly exploring our LNL+K task.

Learning · 經驗池 · 強化學習 · Performer · Buffer（公司） ·

2023 年 6 月 20 日

Evolutionary Strategy Guided Reinforcement Learning via MultiBuffer Communication

Adam Callaghan,Karl Mason,Patrick Mannion

from arxiv, 9 pages, 4 figures, ALA 2023 workshop

Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved control problems across a variety of domains. Recently, algorithms have been proposed which combine these two methods, aiming to leverage the strengths and mitigate the weaknesses of both approaches. In this paper we introduce a new Evolutionary Reinforcement Learning model which combines a particular family of Evolutionary algorithm called Evolutionary Strategies with the off-policy Deep Reinforcement Learning algorithm TD3. The framework utilises a multi-buffer system instead of using a single shared replay buffer. The multi-buffer system allows for the Evolutionary Strategy to search freely in the search space of policies, without running the risk of overpopulating the replay buffer with poorly performing trajectories which limit the number of desirable policy behaviour examples thus negatively impacting the potential of the Deep Reinforcement Learning within the shared framework. The proposed algorithm is demonstrated to perform competitively with current Evolutionary Reinforcement Learning algorithms on MuJoCo control tasks, outperforming the well known state-of-the-art CEM-RL on 3 of the 4 environments tested.

樣本 · INFORMS · Performer · 估計/估計量 · 全 ·

2023 年 6 月 19 日

Bell sampling from quantum circuits

Dominik Hangleiter,Michael J. Gullans

from arxiv, 7+12 pages, 2 figures. Comments welcome. v2: corrected typos, added references

A central challenge in the verification of quantum computers is benchmarking their performance as a whole and demonstrating their computational capabilities. In this work, we find a model of quantum computation, Bell sampling, that can be used for both of those tasks and thus provides an ideal stepping stone towards fault-tolerance. In Bell sampling, we measure two copies of a state prepared by a quantum circuit in the transversal Bell basis. We show that the Bell samples are classically intractable to produce and at the same time constitute what we call a circuit shadow: from the Bell samples we can efficiently extract information about the quantum circuit preparing the state, as well as diagnose circuit errors. In addition to known properties that can be efficiently extracted from Bell samples, we give two new and efficient protocols, a test for the depth of the circuit and an algorithm to estimate a lower bound to the number of T gates in the circuit. With some additional measurements, our algorithm learns a full description of states prepared by circuits with low T-count.

靜態博弈 · 通道 · 可辨認的 · INFORMS · 平穩的 ·

2023 年 6 月 19 日

Static and Dynamic Jamming Games Over Wireless Channels With Mobile Strategic Players

Giovanni Perin,Leonardo Badia

from arxiv, 13 pages, 10 figures; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

We study a wireless jamming problem consisting of the competition between a legitimate receiver and a jammer, as a zero-sum game with the value to maximize/minimize being the channel capacity at the receiver's side. Most of the approaches found in the literature consider the two players to be stationary nodes. Instead, we investigate what happens when they can change location, specifically moving along a linear geometry. We frame this at first as a static game, which can be solved in closed form, and subsequently we extend it to a dynamic game, under three different versions for what concerns completeness/perfection of mutual information about the adversary's position, corresponding to different assumptions of concealment/sequentiality of the moves, respectively. We first provide some theoretical conditions that hold for the static game and also help identify good strategies valid under any setup, including dynamic games. Since dynamic games, although more realistic, are characterized by an exploding strategy space, we exploit reinforcement learning to obtain efficient strategies leading to equilibrium outcomes. We show how theoretical findings can be used to train smart agents to play the game, and validate our approach in practical setups.

Learning · 聯邦學習 · 簇 · MoDELS · Networking ·

2023 年 6 月 18 日

Clustered Federated Learning via Generalized Total Variation Minimization

Yasmin SarcheshmehPour,Yu Tian,Linli Zhang,Alexander Jung

We study optimization methods to train local (or personalized) models for decentralized collections of local datasets with an intrinsic network structure. This network structure arises from domain-specific notions of similarity between local datasets. Examples for such notions include spatio-temporal proximity, statistical dependencies or functional relations. Our main conceptual contribution is to formulate federated learning as generalized total variation (GTV) minimization. This formulation unifies and considerably extends existing federated learning methods. It is highly flexible and can be combined with a broad range of parametric models, including generalized linear models or deep neural networks. Our main algorithmic contribution is a fully decentralized federated learning algorithm. This algorithm is obtained by applying an established primal-dual method to solve GTV minimization. It can be implemented as message passing and is robust against inexact computations that arise from limited computational resources including processing time or bandwidth. Our main analytic contribution is an upper bound on the deviation between the local model parameters learnt by our algorithm and an oracle-based clustered federated learning method. This upper bound reveals conditions on the local models and the network structure of local datasets such that GTV minimization is able to pool (nearly) homogeneous local datasets.

估計/估計量 · Learning · Agent · INFORMS · 優化器 ·

2023 年 6 月 16 日

Cooperative Multi-Agent Reinforcement Learning with Partial Observations

Yan Zhang,Michael M. Zavlanos

In this paper, we propose a distributed zeroth-order policy optimization method for Multi-Agent Reinforcement Learning (MARL). Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multi-hop neighbors may incur significant communication overhead. The advantage of the proposed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, to calculate the local policy gradients, we develop a new distributed zeroth-order policy gradient estimator that relies on one-point residual-feedback which, compared to existing zeroth-order estimators that also rely on one-point feedback, significantly reduces the variance of the policy gradient estimates improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to the neighborhood of a policy that is a stationary point of the global objective function. The size of this neighborhood depends on the agents' learning rates, the exploration parameters, and the number of consensus steps used to calculate the local estimates of the global accumulated rewards. Moreover, we provide numerical experiments that demonstrate that our new zeroth-order policy gradient estimator is more sample-efficient compared to other existing one-point estimators.

Learning · 深度強化學習 · 強化學習 · Agent · 回合 ·

2022 年 12 月 1 日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Qiyue Yin,Tongtong Yu,Shengqi Shen,Jun Yang,Meijing Zhao,Kaiqi Huang,Bin Liang,Liang Wang

from arxiv, 14 pages, 17 figures

With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.

學成 · 強化學習 · 確定性策略 · 稀疏編碼 · Performer ·

2021 年 1 月 7 日

Coding for Distributed Multi-Agent Reinforcement Learning

Baoqian Wang,Junfei Xie,Nikolay Atanasov

This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient(MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS)code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.