Classical federated learning approaches incur significant performance degradation in the presence of non-IID client data. A possible direction to address this issue is forming clusters of clients with roughly IID data. Most solutions following this direction are iterative and relatively slow, also prone to convergence issues in discovering underlying cluster formations. We introduce federated learning with taskonomy (FLT) that generalizes this direction by learning the task-relatedness between clients for more efficient federated aggregation of heterogeneous data. In a one-off process, the server provides the clients with a pretrained (and fine-tunable) encoder to compress their data into a latent representation, and transmit the signature of their data back to the server. The server then learns the task-relatedness among clients via manifold learning, and performs a generalization of federated averaging. FLT can flexibly handle a generic client relatedness graph, when there are no explicit clusters of clients, as well as efficiently decompose it into (disjoint) clusters for clustered federated learning. We demonstrate that FLT not only outperforms the existing state-of-the-art baselines in non-IID scenarios but also offers improved fairness across clients.
Federated Learning (FL) has been considered as an appealing framework to tackle data privacy issues of mobile devices compared to conventional Machine Learning (ML). Using Edge Servers (ESs) as intermediaries to perform model aggregation in proximity can reduce the transmission overhead, and it enables great potentials in low-latency FL, where the hierarchical architecture of FL (HFL) has been attracted more attention. Designing a proper client selection policy can significantly improve training performance, and it has been extensively used in FL studies. However, to the best of our knowledge, there are no studies focusing on HFL. In addition, client selection for HFL faces more challenges than conventional FL, e.g., the time-varying connection of client-ES pairs and the limited budget of the Network Operator (NO). In this paper, we investigate a client selection problem for HFL, where the NO learns the number of successful participating clients to improve the training performance (i.e., select as many clients in each round) as well as under the limited budget on each ES. An online policy, called Context-aware Online Client Selection (COCS), is developed based on Contextual Combinatorial Multi-Armed Bandit (CC-MAB). COCS observes the side-information (context) of local computing and transmission of client-ES pairs and makes client selection decisions to maximize NO's utility given a limited budget. Theoretically, COCS achieves a sublinear regret compared to an Oracle policy on both strongly convex and non-convex HFL. Simulation results also support the efficiency of the proposed COCS policy on real-world datasets.
Federated learning (FL) has emerged to jointly train a model with distributed data sets in IoT while avoiding the need for central data collection. Due to limited observation range, such data sets can only reflect local information, which limits the quality of trained models. In practical network, the global information and local observations always coexist, which requires joint consideration for learning to make reasonable policy. However, in horizontal FL among distributed clients, the central agency only acts as a model aggregator without utilizing its global features to further improve the model. This could largely degrade the performance in some missions such as flow prediction, where the global information could obviously enhance the accuracy. Meanwhile, such global feature may not be directly transmitted to agents for data security. Then how to utilize the global observation residing in the central agency while protecting its safety rises up as an important problem in FL. In this paper, we developed the vertical-horizontal federated learning (VHFL) process, where the global feature is shared with the agents in a procedure similar to vertical FL without extra communication rounds. Considering the delay and packet loss, we analyzed its convergence in the network system and validated its performance by experiments. The proposed VHFL could enhance the accuracy compared with the horizontal FL while protecting the security of global data.
In this paper, we present a communication-efficient federated learning framework inspired by quantized compressed sensing. The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server (PS). Our strategy for gradient compression is to sequentially perform block sparsification, dimensional reduction, and quantization. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. For accurate aggregation of the local gradients from the compressed signals at the PS, we put forth an approximate minimum mean square error (MMSE) approach for gradient reconstruction using the expectation-maximization generalized-approximate-message-passing (EM-GAMP) algorithm. Assuming Bernoulli Gaussian-mixture prior, this algorithm iteratively updates the posterior mean and variance of local gradients from the compressed signals. We also present a low-complexity approach for the gradient reconstruction. In this approach, we use the Bussgang theorem to aggregate local gradients from the compressed signals, then compute an approximate MMSE estimate of the aggregated gradient using the EM-GAMP algorithm. We also provide a convergence rate analysis of the presented framework. Using the MNIST dataset, we demonstrate that the presented framework achieves almost identical performance with the case that performs no compression, while significantly reducing communication overhead for federated learning.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a heuristic method to further improve the performance of FedAMP when clients adopt deep neural networks as personalized models. Our extensive experiments on benchmark data sets demonstrate the superior performance of the proposed methods.
There has been a surge of interest in continual learning and federated learning, both of which are important in deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from a private local data stream. This problem of federated continual learning poses new challenges to continual learning, such as utilizing knowledge from other clients, while preventing interference from irrelevant knowledge. To resolve these issues, we propose a novel federated continual learning framework, Federated Weighted Inter-client Transfer (FedWeIT), which decomposes the network weights into global federated parameters and sparse task-specific parameters, and each client receives selective knowledge from other clients by taking a weighted combination of their task-specific parameters. FedWeIT minimizes interference between incompatible tasks, and also allows positive knowledge transfer across clients during learning. We validate our \emph{FedWeIT}~against existing federated learning and continual learning methods under varying degrees of task similarity across clients, and our model significantly outperforms them with a large reduction in the communication cost.
Federated learning has been showing as a promising approach in paving the last mile of artificial intelligence, due to its great potential of solving the data isolation problem in large scale machine learning. Particularly, with consideration of the heterogeneity in practical edge computing systems, asynchronous edge-cloud collaboration based federated learning can further improve the learning efficiency by significantly reducing the straggler effect. Despite no raw data sharing, the open architecture and extensive collaborations of asynchronous federated learning (AFL) still give some malicious participants great opportunities to infer other parties' training data, thus leading to serious concerns of privacy. To achieve a rigorous privacy guarantee with high utility, we investigate to secure asynchronous edge-cloud collaborative federated learning with differential privacy, focusing on the impacts of differential privacy on model convergence of AFL. Formally, we give the first analysis on the model convergence of AFL under DP and propose a multi-stage adjustable private algorithm (MAPA) to improve the trade-off between model utility and privacy by dynamically adjusting both the noise scale and the learning rate. Through extensive simulations and real-world experiments with an edge-could testbed, we demonstrate that MAPA significantly improves both the model accuracy and convergence speed with sufficient privacy guarantee.
We present one-shot federated learning, where a central server learns a global model over a network of federated devices in a single round of communication. Our approach - drawing on ensemble learning and knowledge aggregation - achieves an average relative gain of 51.5% in AUC over local baselines and comes within 90.1% of the (unattainable) global ideal. We discuss these methods and identify several promising directions of future work.
We study object recognition under the constraint that each object class is only represented by very few observations. In such cases, naive supervised learning would lead to severe over-fitting in deep neural networks due to limited training data. We tackle this problem by creating much more training data through label propagation from the few labeled examples to a vast collection of unannotated images. Our main insight is that such a label propagation scheme can be highly effective when the similarity metric used for propagation is learned and transferred from other related domains with lots of data. We test our approach on semi-supervised learning, transfer learning and few-shot recognition, where we learn our similarity metric using various supervised/unsupervised pretraining methods, and transfer it to unlabeled data across different data distributions. By taking advantage of unlabeled data in this way, we achieve significant improvements on all three tasks. Notably, our approach outperforms current state-of-the-art techniques by an absolute $20\%$ for semi-supervised learning on CIFAR10, $10\%$ for transfer learning from ImageNet to CIFAR10, and $6\%$ for few-shot recognition on mini-ImageNet, when labeled examples are limited.