亚洲AV午夜成人片精品网站听书-全网最新黄色网站

Deploying federated learning (FL) over wireless networks with resource-constrained devices requires balancing between accuracy, energy efficiency, and precision. Prior art on FL often requires devices to train deep neural networks (DNNs) using a 32-bit precision level for data representation to improve accuracy. However, such algorithms are impractical for resource-constrained devices since DNNs could require execution of millions of operations. Thus, training DNNs with a high precision level incurs a high energy cost for FL. In this paper, a quantized FL framework, that represents data with a finite level of precision in both local training and uplink transmission, is proposed. Here, the finite level of precision is captured through the use of quantized neural networks (QNNs) that quantize weights and activations in fixed-precision format. In the considered FL model, each device trains its QNN and transmits a quantized training result to the base station. Energy models for the local training and the transmission with the quantization are rigorously derived. An energy minimization problem is formulated with respect to the level of precision while ensuring convergence. To solve the problem, we first analytically derive the FL convergence rate and use a line search method. Simulation results show that our FL framework can reduce energy consumption by up to 53% compared to a standard FL model. The results also shed light on the tradeoff between precision, energy, and accuracy in FL over wireless networks.

相關內容

查準率/準確率

關注 0

系統設計 · Networks · 清華大學智能產業研究院 · Better · 回合 ·

2022 年 1 月 21 日

Paving the Way for Distributed Artificial Intelligence over the Air

Guoqing Ma,Shuping Dang,Chuanting Zhang,Basem Shihada

Distributed Artificial Intelligence (DAI) is regarded as one of the most promising techniques to provide intelligent services under strict privacy protection regulations for multiple clients. By applying DAI, training on raw data is carried out locally, while the trained outputs, e.g., model parameters, from multiple local clients, are sent back to a central server for aggregation. Recently, for achieving better practicality, DAI is studied in conjunction with wireless communication networks, incorporating various random effects brought by wireless channels. However, because of the complex and case-dependent nature of wireless channels, a generic simulator for applying DAI in wireless communication networks is still lacking. To accelerate the development of DAI applied in wireless communication networks, we propose a generic system design in this paper as well as an associated simulator that can be set according to wireless channels and system-level configurations. Details of the system design and analysis of the impacts of wireless environments are provided to facilitate further implementations and updates. We employ a series of experiments to verify the effectiveness and efficiency of the proposed system design and reveal its superior scalability.

聯邦學習 · 優化器 · 學成 · 損失函數（機器學習） · 泛函 ·

2022 年 1 月 19 日

Communication-Efficient Device Scheduling for Federated Learning Using Stochastic Optimization

Jake Perazzone,Shiqiang Wang,Mingyue Ji,Kevin Chan

from arxiv, To be included in Proceedings of INFOCOM 2022, 10 Pages, 5 Figures

Federated learning (FL) is a useful tool in distributed machine learning that utilizes users' local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment; however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection quality, and non-i.i.d. data. In this paper, we provide a novel convergence analysis of non-convex loss functions using FL on both i.i.d. and non-i.i.d. datasets with arbitrary device selection probabilities for each round. Then, using the derived convergence bound, we use stochastic optimization to develop a new client selection and power allocation algorithm that minimizes a function of the convergence bound and the average communication time under a transmit power constraint. We find an analytical solution to the minimization problem. One key feature of the algorithm is that knowledge of the channel statistics is not required and only the instantaneous channel state information needs to be known. Using the FEMNIST and CIFAR-10 datasets, we show through simulations that the communication time can be significantly decreased using our algorithm, compared to uniformly random participation.

非凸 · 優化器 · 增廣拉格朗日法 · INFORMS · 約束 ·

2022 年 1 月 19 日

Multiblock ADMM for nonsmooth nonconvex optimization with nonlinear coupling constraints

Le Thi Khanh Hien,Dimitri Papadimitriou

This paper considers a multiblock nonsmooth nonconvex optimization problem with nonlinear coupling constraints. By developing the idea of using the information zone and adaptive regime proposed in [J. Bolte, S. Sabach and M. Teboulle, Nonconvex Lagrangian-based optimization: Monitoring schemes and global convergence, Mathematics of Operations Research, 43: 1210--1232, 2018], we propose a multiblock alternating direction method of multipliers for solving this problem. We specify the update of the primal variables by employing a majorization minimization procedure in each block update. An independent convergence analysis is conducted to prove subsequential as well as global convergence of the generated sequence to a critical point of the augmented Lagrangian. We also establish iteration complexity and provide preliminary numerical results for the proposed algorithm.

聯邦學習 · 學成 · 基 · 提議分布 · 可約的 ·

2022 年 1 月 19 日

Towards Energy Efficient Distributed Federated Learning for 6G Networks

Sunder Ali Khowaja,Kapal Dev,Parus Khuwaja,Paolo Bellavista

from arxiv, 8 pages, 4 figures, 1 table, Magazine Article

The provision of communication services via portable and mobile devices, such as aerial base stations, is a crucial concept to be realized in 5G/6G networks. Conventionally, IoT/edge devices need to transmit the data directly to the base station for training the model using machine learning techniques. The data transmission introduces privacy issues that might lead to security concerns and monetary losses. Recently, Federated learning was proposed to partially solve privacy issues via model-sharing with base station. However, the centralized nature of federated learning only allow the devices within the vicinity of base stations to share the trained models. Furthermore, the long-range communication compels the devices to increase transmission power, which raises the energy efficiency concerns. In this work, we propose distributed federated learning (DBFL) framework that overcomes the connectivity and energy efficiency issues for distant devices. The DBFL framework is compatible with mobile edge computing architecture that connects the devices in a distributed manner using clustering protocols. Experimental results show that the framework increases the classification performance by 7.4\% in comparison to conventional federated learning while reducing the energy consumption.

模型并行 · Extensibility · MoDELS · 學成 · 深度學習 ·

2022 年 1 月 19 日

On the Acceleration of Deep Learning Model Parallelism with Staleness

An Xu,Zhouyuan Huo,Heng Huang

Training the deep convolutional neural network for computer vision problems is slow and inefficient, especially when it is large and distributed across multiple devices. The inefficiency is caused by the backpropagation algorithm's forward locking, backward locking, and update locking problems. Existing solutions for acceleration either can only handle one locking problem or lead to severe accuracy loss or memory inefficiency. Moreover, none of them consider the straggler problem among devices. In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges. We also analyze the convergence of DSP with two popular gradient-based methods and prove that both of them are guaranteed to converge to critical points for non-convex problems. Finally, extensive experimental results on training deep learning models demonstrate that our proposed DSP algorithm can achieve significant training speedup with stronger robustness than compared methods.

SGD · Performer · 隨機梯度下降 · CIFAR-10 · MNIST (數據集) ·

2022 年 1 月 19 日

Decentralized Federated Learning: Balancing Communication and Computing Costs

Wei Liu,Li Chen,Wenyi Zhang

Decentralized stochastic gradient descent (SGD) is a driving engine for decentralized federated learning (DFL). The performance of decentralized SGD is jointly influenced by inter-node communications and local updates. In this paper, we propose a general DFL framework, which implements both multiple local updates and multiple inter-node communications periodically, to strike a balance between communication efficiency and model consensus. It can provide a general decentralized SGD analytical framework. We establish strong convergence guarantees for the proposed DFL algorithm without the assumption of convex objectives. The convergence rate of DFL can be optimized to achieve the balance of communication and computing costs under constrained resources. For improving communication efficiency of DFL, compressed communication is further introduced to the proposed DFL as a new scheme, named DFL with compressed communication (C-DFL). The proposed C-DFL exhibits linear convergence for strongly convex objectives. Experiment results based on MNIST and CIFAR-10 datasets illustrate the superiority of DFL over traditional decentralized SGD methods and show that C-DFL further enhances communication efficiency.

FPL · 學成 · MoDELS · 代價 · 邊 ·

2022 年 1 月 19 日

Flexible Parallel Learning in Edge Scenarios: Communication, Computational and Energy Cost

Francesco Malandrino,Carla Fabiana Chiasserini

Traditionally, distributed machine learning takes the guise of (i) different nodes training the same model (as in federated learning), or (ii) one model being split among multiple nodes (as in distributed stochastic gradient descent). In this work, we highlight how fog- and IoT-based scenarios often require combining both approaches, and we present a framework for flexible parallel learning (FPL), achieving both data and model parallelism. Further, we investigate how different ways of distributing and parallelizing learning tasks across the participating nodes result in different computation, communication, and energy costs. Our experiments, carried out using state-of-the-art deep-network architectures and large-scale datasets, confirm that FPL allows for an excellent trade-off among computational (hence energy) cost, communication overhead, and learning performance.

聯邦學習 · 學成 · Extensibility · MoDELS · Performer ·

2021 年 3 月 30 日

Model-Contrastive Federated Learning

Qinbin Li,Bingsheng He,Dawn Song

from arxiv, Accepted by CVPR 2021

Federated learning enables multiple parties to collaboratively train a machine learning model without communicating their local data. A key challenge in federated learning is to handle the heterogeneity of local data distribution across parties. Although many studies have been proposed to address this challenge, we find that they fail to achieve high performance in image datasets with deep learning models. In this paper, we propose MOON: model-contrastive federated learning. MOON is a simple and effective federated learning framework. The key idea of MOON is to utilize the similarity between model representations to correct the local training of individual parties, i.e., conducting contrastive learning in model-level. Our extensive experiments show that MOON significantly outperforms the other state-of-the-art federated learning algorithms on various image classification tasks.

聯邦學習 · Extensibility · 學成 · MoDELS · 邊 ·

2019 年 12 月 17 日

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Yanan Li,Shusen Yang,Xuebin Ren,Cong Zhao

Federated learning has been showing as a promising approach in paving the last mile of artificial intelligence, due to its great potential of solving the data isolation problem in large scale machine learning. Particularly, with consideration of the heterogeneity in practical edge computing systems, asynchronous edge-cloud collaboration based federated learning can further improve the learning efficiency by significantly reducing the straggler effect. Despite no raw data sharing, the open architecture and extensive collaborations of asynchronous federated learning (AFL) still give some malicious participants great opportunities to infer other parties' training data, thus leading to serious concerns of privacy. To achieve a rigorous privacy guarantee with high utility, we investigate to secure asynchronous edge-cloud collaborative federated learning with differential privacy, focusing on the impacts of differential privacy on model convergence of AFL. Formally, we give the first analysis on the model convergence of AFL under DP and propose a multi-stage adjustable private algorithm (MAPA) to improve the trade-off between model utility and privacy by dynamically adjusting both the noise scale and the learning rate. Through extensive simulations and real-world experiments with an edge-could testbed, we demonstrate that MAPA significantly improves both the model accuracy and convergence speed with sufficient privacy guarantee.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.