苍井空无码免费换线,99热日韩这里只有国产中文精品,色婷婷狠狠色丁香五月99,亚洲аv电影天堂网无码,国产精品一区二区三区電影

Federated averaging (FedAvg) is a communication efficient algorithm for the distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the updated parameters from clients. FedAvg is mostly studied in centralized fashions, which requires massive communication between server and clients in each communication. Moreover, attacking the central server can break the whole system's privacy. In this paper, we study the decentralized FedAvg with momentum (DFedAvgM), which is implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved when the loss function satisfies the P{\L} property. Finally, we numerically verify the efficacy of DFedAvgM.

相關內容

可約(yue)的(de)

關注 2

鞍點 · 優化器 · 學成 · Networking · 情景 ·

2021 年 6 月 14 日

Decentralized Personalized Federated Min-Max Problems

Aleksandr Beznosikov,Vadim Sushko,Abdurakhmon Sadiev,Alexander Gasnikov

Personalized Federated Learning has recently seen tremendous progress, allowing the design of novel machine learning applications preserving privacy of the data used for training. Existing theoretical results in this field mainly focus on distributed optimization under minimization problems. This paper is the first to study PFL for saddle point problems, which cover a broader class of optimization tasks and are thus of more relevance for applications than the minimization. In this work, we consider a recently proposed PFL setting with the mixing objective function, an approach combining the learning of a global model together with local distributed learners. Unlike most of the previous papers, which considered only the centralized setting, we work in a more general and decentralized setup. This allows to design and to analyze more practical and federated ways to connect devices to the network. We present two new algorithms for our problem. A theoretical analysis of the methods is presented for smooth (strongly-)convex-(strongly-)concave saddle point problems. We also demonstrate the effectiveness of our problem formulation and the proposed algorithms on experiments with neural networks with adversarial noise.

優化器 · Networking · 有向 · 可約的 · 線性的 ·

2021 年 6 月 14 日

Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks

Zhuoqing Song,Lei Shi,Shi Pu,Ming Yan

from arxiv, working paper

In this paper, we propose two communication-efficient algorithms for decentralized optimization over a multi-agent network with general directed network topology. In the first part, we consider a novel communication-efficient gradient tracking based method, termed Compressed Push-Pull (CPP), which combines the Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence for strongly convex and smooth objective functions. In the second part, we propose a broadcast-like version of CPP (B-CPP), which also achieves linear convergence rate under the same conditions for the objective functions. B-CPP can be applied in an asynchronous broadcast setting and further reduce communication costs compared to CPP. Numerical experiments complement the theoretical analysis and confirm the effectiveness of the proposed methods.

INFORMS · 可交換的 · DFP · Networking · Continuity ·

2021 年 6 月 13 日

Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks

Sarper Ayd?n,Ceyhun Eksin

from arxiv, 10 pages

Multiple autonomous agents interact over a random communication network to maximize their individual utility functions which depend on the actions of other agents. We consider decentralized best-response with inertia type algorithms in which agents form beliefs about the future actions of other players based on local information, and take an action that maximizes their expected utility computed with respect to these beliefs or continue to take their previous action. We show convergence of these types of algorithms to a Nash equilibrium in weakly acyclic games under the condition that the belief update and information exchange protocols successfully learn the actions of other players with positive probability in finite time given a static environment, i.e., when other agents' actions do not change. We design a decentralized fictitious play algorithm with voluntary and limited communication (DFP-VL) protocols that satisfy this condition. In the voluntary communication protocol, each agent decides whom to exchange information with by assessing the novelty of its information and the potential effect of its information on others' assessments of their utility functions. The limited communication protocol entails agents sending only their most frequent action to agents that they decide to communicate with. Numerical experiments on a target assignment game demonstrate that the voluntary and limited communication protocol can more than halve the number of communication attempts while retaining the same convergence rate as DFP in which agents constantly attempt to communicate.

聯邦學習 · 學成 · Less · Networking · 可約的 ·

2021 年 6 月 11 日

Efficient and Less Centralized Federated Learning

Li Chou,Zichang Liu,Zhuang Wang,Anshumali Shrivastava

With the rapid growth in mobile computing, massive amounts of data and computing resources are now located at the edge. To this end, Federated learning (FL) is becoming a widely adopted distributed machine learning (ML) paradigm, which aims to harness this expanding skewed data locally in order to develop rich and informative models. In centralized FL, a collection of devices collaboratively solve a ML task under the coordination of a central server. However, existing FL frameworks make an over-simplistic assumption about network connectivity and ignore the communication bandwidth of the different links in the network. In this paper, we present and study a novel FL algorithm, in which devices mostly collaborate with other devices in a pairwise manner. Our nonparametric approach is able to exploit network topology to reduce communication bottlenecks. We evaluate our approach on various FL benchmarks and demonstrate that our method achieves 10X better communication efficiency and around 8% increase in accuracy compared to the centralized approach.

優化器 · 圖片分類 · Machine Learning · Networks · 縮放 ·

2021 年 6 月 11 日

Optimal Complexity in Decentralized Training

Yucheng Lu,Christopher De Sa

Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.

優化器 · 代價 · 超參數 · 控制器 · 黑盒 ·

2021 年 6 月 10 日

A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

Eric Hans Lee,David Eriksson,Valerio Perrone,Matthias Seeger

from arxiv, To appear in UAI 2021

Bayesian optimization (BO) is a popular method for optimizing expensive-to-evaluate black-box functions. BO budgets are typically given in iterations, which implicitly assumes each evaluation has the same cost. In fact, in many BO applications, evaluation costs vary significantly in different regions of the search space. In hyperparameter optimization, the time spent on neural network training increases with layer size; in clinical trials, the monetary cost of drug compounds vary; and in optimal control, control actions have differing complexities. Cost-constrained BO measures convergence with alternative cost metrics such as time, money, or energy, for which the sample efficiency of standard BO methods is ill-suited. For cost-constrained BO, cost efficiency is far more important than sample efficiency. In this paper, we formulate cost-constrained BO as a constrained Markov decision process (CMDP), and develop an efficient rollout approximation to the optimal CMDP policy that takes both the cost and future iterations into account. We validate our method on a collection of hyperparameter optimization problems as well as a sensor set selection application.

動量法 · 優化器 · Performer · 動量 · CASE ·

2021 年 6 月 10 日

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Babak Barazandeh,Tianjian Huang,George Michailidis

Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks~(GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in the case of multiple machines connected in centralized fashion, wherein each one communicates with a central node. The latter approach becomes challenging, when the underlying communications network has low bandwidth. In addition, privacy considerations may dictate that certain nodes can communicate with a subset of other nodes. Hence, it is of interest to develop methods that solve min-max games in a decentralized manner. To that end, we develop a decentralized adaptive momentum (ADAM)-type algorithm for solving min-max optimization problem under the condition that the objective function satisfies a Minty Variational Inequality condition, which is a generalization to convex-concave case. The proposed method overcomes shortcomings of recent non-adaptive gradient-based decentralized algorithms for min-max optimization problems that do not perform well in practice and require careful tuning. In this paper, we obtain non-asymptotic rates of convergence of the proposed algorithm (coined DADAM$^3$) for finding a (stochastic) first-order Nash equilibrium point and subsequently evaluate its performance on training GANs. The extensive empirical evaluation shows that DADAM$^3$ outperforms recently developed methods, including decentralized optimistic stochastic gradient for solving such min-max problems.

聯邦學習 · CASE · 學成 · FAST · 優化器 ·

2021 年 6 月 8 日

Fast Federated Learning in the Presence of Arbitrary Device Unavailability

Xinran Gu,Kaixuan Huang,Jingzhao Zhang,Longbo Huang

Federated Learning (FL) coordinates with numerous heterogeneous devices to collaboratively train a shared model while preserving user privacy. Despite its multiple advantages, FL faces new challenges. One challenge arises when devices drop out of the training process beyond the control of the central server. In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices. To tackle this challenge, we study federated learning algorithms under arbitrary device unavailability and propose an algorithm named Memory-augmented Impatient Federated Averaging (MIFA). Our algorithm efficiently avoids excessive latency induced by inactive devices, and corrects the gradient bias using the memorized latest updates from the devices. We prove that MIFA achieves minimax optimal convergence rates on non-i.i.d. data for both strongly convex and non-convex smooth functions. We also provide an explicit characterization of the improvement over baseline algorithms through a case study, and validate the results by numerical experiments on real-world datasets.

Facebook AI Research · 聯邦學習 · Extensibility · 學成 · 可辨認的 ·

2021 年 4 月 30 日

Federated Learning with Fair Averaging

Zheng Wang,Xiaoliang Fan,Jianzhong Qi,Chenglu Wen,Cheng Wang,Rongshan Yu

from arxiv, to be published in IJCAI2021

Fairness has emerged as a critical problem in federated learning (FL). In this work, we identify a cause of unfairness in FL -- \emph{conflicting} gradients with large differences in the magnitudes. To address this issue, we propose the federated fair averaging (FedFV) algorithm to mitigate potential conflicts among clients before averaging their gradients. We first use the cosine similarity to detect gradient conflicts, and then iteratively eliminate such conflicts by modifying both the direction and the magnitude of the gradients. We further show the theoretical foundation of FedFV to mitigate the issue conflicting gradients and converge to Pareto stationary solutions. Extensive experiments on a suite of federated datasets confirm that FedFV compares favorably against state-of-the-art methods in terms of fairness, accuracy and efficiency.

優化器 · MoDELS · 分布式機器學習 · Performer · CIFAR-10 ·

2020 年 2 月 18 日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Yikai Yan,Chaoyue Niu,Yucheng Ding,Zhenzhe Zheng,Fan Wu,Guihai Chen,Shaojie Tang,Zhihua Wu

from arxiv, ICML 2020 Submission

Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.