Federated edge learning (FEEL) has attracted much attention as a privacy-preserving paradigm to effectively incorporate the distributed data at the network edge for training deep learning models. Nevertheless, the limited coverage of a single edge server results in an insufficient number of participated client nodes, which may impair the learning performance. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL), where multiple edge servers are employed to collectively coordinate a large number of client nodes. By exploiting the low-latency communication among edge servers for efficient model sharing, SD-FEEL can incorporate more training data, while enjoying much lower latency compared with conventional federated learning. We detail the training algorithm for SD-FEEL with three main steps, including local model update, intra-cluster, and inter-cluster model aggregations. The convergence of this algorithm is proved on non-independent and identically distributed (non-IID) data, which also helps to reveal the effects of key parameters on the training efficiency and provides practical design guidelines. Meanwhile, the heterogeneity of edge devices may cause the straggler effect and deteriorate the convergence speed of SD-FEEL. To resolve this issue, we propose an asynchronous training algorithm with a staleness-aware aggregation scheme for SD-FEEL, of which, the convergence performance is also analyzed. The simulation results demonstrate the effectiveness and efficiency of the proposed algorithms for SD-FEEL and corroborate our analysis.
We propose a federated learning (FL) in stratosphere (FLSTRA) system, where a high altitude platform station (HAPS) facilitates a large number of terrestrial clients to collaboratively learn a global model without sharing the training data. FLSTRA overcomes the challenges faced by FL in terrestrial networks, such as slow convergence and high communication delay due to limited client participation and multi-hop communications. HAPS leverages its altitude and size to allow the participation of more clients with line-of-sight (LOS) links and the placement of a powerful server. However, handling many clients at once introduces computing and transmission delays. Thus, we aim to obtain a delay-accuracy trade-off for FLSTRA. Specifically, we first develop a joint client selection and resource allocation algorithm for uplink and downlink to minimize the FL delay subject to the energy and quality-of-service (QoS) constraints. Second, we propose a communication and computation resource-aware (CCRA-FL) algorithm to achieve the target FL accuracy while deriving an upper bound for its convergence rate. The formulated problem is non-convex; thus, we propose an iterative algorithm to solve it. Simulation results demonstrate the effectiveness of the proposed FLSTRA system, compared to terrestrial benchmarks, in terms of FL delay and accuracy.
To mitigate the privacy leakages and communication burdens of Federated Learning (FL), decentralized FL (DFL) discards the central server and each client only communicates with its neighbors in a decentralized communication network. However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topology. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL. Specifically, DFedSAM leverages gradient perturbation to generate local flat models via Sharpness Aware Minimization (SAM), which searches for models with uniformly low loss values. DFedSAM-MGS further boosts DFedSAM by adopting Multiple Gossip Steps (MGS) for better model consistency, which accelerates the aggregation of local flat models and better balances communication complexity and generalization. Theoretically, we present improved convergence rates $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{1}{K^{1/2}T^{3/2}(1-\lambda)^2}\big)$ and $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{\lambda^Q+1}{K^{1/2}T^{3/2}(1-\lambda^Q)^2}\big)$ in non-convex setting for DFedSAM and DFedSAM-MGS, respectively, where $1-\lambda$ is the spectral gap of gossip matrix and $Q$ is the number of MGS. Empirically, our methods can achieve competitive performance compared with CFL methods and outperform existing DFL methods.
Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers. We observe that while the feature extraction layers are learned efficiently by FedAvg, the substantial diversity of the final classification layers across clients impedes the performance. Motivated by this, we propose to correct model drift by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost. We furthermore provide proof for the convergence rate of our algorithm.
With the arising concerns of privacy within machine learning, federated learning (FL) was invented in 2017, in which the clients, such as mobile devices, train a model and send the update to the centralized server. Choosing clients randomly for FL can harm learning performance due to different reasons. Many studies have proposed approaches to address the challenges of client selection of FL. However, no systematic literature review (SLR) on this topic existed. This SLR investigates the state of the art of client selection in FL and answers the challenges, solutions, and metrics to evaluate the solutions. We systematically reviewed 47 primary studies. The main challenges found in client selection are heterogeneity, resource allocation, communication costs, and fairness. The client selection schemes aim to improve the original random selection algorithm by focusing on one or several of the aforementioned challenges. The most common metric used is testing accuracy versus communication rounds, as testing accuracy measures the successfulness of the learning and preferably in as few communication rounds as possible, as they are very expensive. Although several possible improvements can be made with the current state of client selection, the most beneficial ones are evaluating the impact of unsuccessful clients and gaining a more theoretical understanding of the impact of fairness in FL.
Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy and minimizing communication overhead. The heterogeneity of devices and networking resources of the participants delay the training and aggregation in federated learning. This paper proposes a federated learning approach to manoeuvre the heterogeneity among the participants using resource aware clustering. The approach begins with the server gathering information about the devices and networking resources of participants, after which resource aware clustering is performed to determine the optimal number of clusters using Dunn Indices. The mechanism of participant assignment is then introduced, and the expression of communication rounds required for model convergence in each cluster is mathematically derived. Furthermore, a master-slave technique is introduced to improve the performance of the lightweight models in the clusters using knowledge distillation. Finally, experimental evaluations are conducted to verify the feasibility and effectiveness of the approach and to compare it with state-of-the-art techniques.
Recent years have witnessed a huge demand for artificial intelligence and machine learning applications in wireless edge networks to assist individuals with real-time services. Owing to the practical setting and privacy preservation of federated learning (FL), it is a suitable and appealing distributed learning paradigm to deploy these applications at the network edge. Despite the many successful efforts made to apply FL to wireless edge networks, the adopted algorithms mostly follow the same spirit as FedAvg, thereby heavily suffering from the practical challenges of label deficiency and device heterogeneity. These challenges not only decelerate the model training in FL but also downgrade the application performance. In this paper, we focus on the algorithm design and address these challenges by investigating the personalized semi-supervised FL problem and proposing an effective algorithm, namely FedCPSL. In particular, the techniques of pseudo-labeling, and interpolation-based model personalization are judiciously combined to provide a new problem formulation for personalized semi-supervised FL. The proposed FedCPSL algorithm adopts novel strategies, including adaptive client variance reduction, local momentum, and normalized global aggregation, to combat the challenge of device heterogeneity and boost algorithm convergence. Moreover, the convergence property of FedCPSL is thoroughly analyzed and shows that FedCPSL is resilient to both statistical and system heterogeneity, obtaining a sublinear convergence rate. Experimental results on image classification tasks are also presented to demonstrate that the proposed approach outperforms its counterparts in terms of both convergence speed and application performance.
Federated learning (FL) has been proposed to protect data privacy and virtually assemble the isolated data silos by cooperatively training models among organizations without breaching privacy and security. However, FL faces heterogeneity from various aspects, including data space, statistical, and system heterogeneity. For example, collaborative organizations without conflict of interest often come from different areas and have heterogeneous data from different feature spaces. Participants may also want to train heterogeneous personalized local models due to non-IID and imbalanced data distribution and various resource-constrained devices. Therefore, heterogeneous FL is proposed to address the problem of heterogeneity in FL. In this survey, we comprehensively investigate the domain of heterogeneous FL in terms of data space, statistical, system, and model heterogeneity. We first give an overview of FL, including its definition and categorization. Then, We propose a precise taxonomy of heterogeneous FL settings for each type of heterogeneity according to the problem setting and learning objective. We also investigate the transfer learning methodologies to tackle the heterogeneity in FL. We further present the applications of heterogeneous FL. Finally, we highlight the challenges and opportunities and envision promising future research directions toward new framework design and trustworthy approaches.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.