This paper studies the problem of federated learning (FL) in the absence of a trustworthy server/clients. In this setting, each client needs to ensure the privacy of its own data without relying on the server or other clients. We study local differential privacy (LDP) and provide tight upper and lower bounds that establish the minimax optimal rates (up to logarithms) for LDP convex/strongly convex federated stochastic optimization. Our rates match the optimal statistical rates in certain practical parameter regimes ("privacy for free"). Second, we develop a novel time-varying noisy SGD algorithm, leading to the first non-trivial LDP risk bounds for FL with non-i.i.d. clients. Third, we consider the special case where each client's loss function is empirical and develop an accelerated LDP FL algorithm to improve communication complexity compared to existing works. We also provide matching lower bounds, establishing the optimality of our algorithm for convex/strongly convex settings. Fourth, with a secure shuffler to anonymize client reports (but without a trusted server), our algorithm attains the optimal central DP rates for stochastic convex/strongly convex optimization, thereby achieving optimality in the local and central models simultaneously. Our upper bounds quantify the role of network communication reliability in performance.
Federated Learning (FL) is a collaborative machine learning technique to train a global model without obtaining clients' private data. The main challenges in FL are statistical diversity among clients, limited computing capability among clients' equipments, and the excessive communication overhead between servers and clients. To address these challenges, we propose a novel sparse personalized federated learning scheme via maximizing correlation FedMac. By incorporating an approximated L1-norm and the correlation between client models and global model into standard FL loss function, the performance on statistical diversity data is improved and the communicational and computational loads required in the network are reduced compared with non-sparse FL. Convergence analysis shows that the sparse constraints in FedMac do not affect the convergence rate of the global model, and theoretical results show that FedMac can achieve good sparse personalization, which is better than the personalized methods based on L2-norm. Experimentally, we demonstrate the benefits of this sparse personalization architecture compared with the state-of-the-art personalization methods (e.g. FedMac respectively achieves 98.95%, 99.37%, 90.90% and 89.06% accuracy on the MNIST, FMNIST, CIFAR-100 and Synthetic datasets under non-i.i.d. variants).
The distributed convex optimization problem over the multi-agent system is considered in this paper, and it is assumed that each agent possesses its own cost function and communicates with its neighbours over a sequence of time-varying directed graphs. However, due to some reasons there exist communication delays while agents receive information from other agents, and we are going to seek the optimal value of the sum of agents' loss functions in this case. We desire to handle this problem with the push-sum distributed dual averaging (PS-DDA) algorithm which is introduced in \cite{Tsianos2012}. It is proved that this algorithm converges and the error decays at a rate $\mathcal{O}\left(T^{-0.5}\right)$ with proper step size, where $T$ is iteration span. The main result presented in this paper also illustrates the convergence of the proposed algorithm is related to the maximum value of the communication delay on one edge. We finally apply the theoretical results to numerical simulations to show the PS-DDA algorithm's performance.
We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples. In contrast, when increasing the number of users $n$, the privacy cost decreases at a faster $O(1/n)$ rate. We complement these results with lower bounds showing the minimax optimality of our algorithms for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius $\tau$ of the distribution rather than the entire range.
News recommendation aims to display news articles to users based on their personal interest. Existing news recommendation methods rely on centralized storage of user behavior data for model training, which may lead to privacy concerns and risks due to the privacy-sensitive nature of user behaviors. In this paper, we propose a privacy-preserving method for news recommendation model training based on federated learning, where the user behavior data is locally stored on user devices. Our method can leverage the useful information in the behaviors of massive number users to train accurate news recommendation models and meanwhile remove the need of centralized storage of them. More specifically, on each user device we keep a local copy of the news recommendation model, and compute gradients of the local model based on the user behaviors in this device. The local gradients from a group of randomly selected users are uploaded to server, which are further aggregated to update the global model in the server. Since the model gradients may contain some implicit private information, we apply local differential privacy (LDP) to them before uploading for better privacy protection. The updated global model is then distributed to each user device for local model update. We repeat this process for multiple rounds. Extensive experiments on a real-world dataset show the effectiveness of our method in news recommendation model training with privacy protection.
Train machine learning models on sensitive user data has raised increasing privacy concerns in many areas. Federated learning is a popular approach for privacy protection that collects the local gradient information instead of real data. One way to achieve a strict privacy guarantee is to apply local differential privacy into federated learning. However, previous works do not give a practical solution due to three issues. First, the noisy data is close to its original value with high probability, increasing the risk of information exposure. Second, a large variance is introduced to the estimated average, causing poor accuracy. Last, the privacy budget explodes due to the high dimensionality of weights in deep learning models. In this paper, we proposed a novel design of local differential privacy mechanism for federated learning to address the abovementioned issues. It is capable of making the data more distinct from its original value and introducing lower variance. Moreover, the proposed mechanism bypasses the curse of dimensionality by splitting and shuffling model updates. A series of empirical evaluations on three commonly used datasets, MNIST, Fashion-MNIST and CIFAR-10, demonstrate that our solution can not only achieve superior deep learning performance but also provide a strong privacy guarantee at the same time.
Federated learning has been showing as a promising approach in paving the last mile of artificial intelligence, due to its great potential of solving the data isolation problem in large scale machine learning. Particularly, with consideration of the heterogeneity in practical edge computing systems, asynchronous edge-cloud collaboration based federated learning can further improve the learning efficiency by significantly reducing the straggler effect. Despite no raw data sharing, the open architecture and extensive collaborations of asynchronous federated learning (AFL) still give some malicious participants great opportunities to infer other parties' training data, thus leading to serious concerns of privacy. To achieve a rigorous privacy guarantee with high utility, we investigate to secure asynchronous edge-cloud collaborative federated learning with differential privacy, focusing on the impacts of differential privacy on model convergence of AFL. Formally, we give the first analysis on the model convergence of AFL under DP and propose a multi-stage adjustable private algorithm (MAPA) to improve the trade-off between model utility and privacy by dynamically adjusting both the noise scale and the learning rate. Through extensive simulations and real-world experiments with an edge-could testbed, we demonstrate that MAPA significantly improves both the model accuracy and convergence speed with sufficient privacy guarantee.
Alternating Direction Method of Multipliers (ADMM) is a widely used tool for machine learning in distributed settings, where a machine learning model is trained over distributed data sources through an interactive process of local computation and message passing. Such an iterative process could cause privacy concerns of data owners. The goal of this paper is to provide differential privacy for ADMM-based distributed machine learning. Prior approaches on differentially private ADMM exhibit low utility under high privacy guarantee and often assume the objective functions of the learning problems to be smooth and strongly convex. To address these concerns, we propose a novel differentially private ADMM-based distributed learning algorithm called DP-ADMM, which combines an approximate augmented Lagrangian function with time-varying Gaussian noise addition in the iterative process to achieve higher utility for general objective functions under the same differential privacy guarantee. We also apply the moments accountant method to bound the end-to-end privacy loss. The theoretical analysis shows that DP-ADMM can be applied to a wider class of distributed learning problems, is provably convergent, and offers an explicit utility-privacy tradeoff. To our knowledge, this is the first paper to provide explicit convergence and utility properties for differentially private ADMM-based distributed learning algorithms. The evaluation results demonstrate that our approach can achieve good convergence and model accuracy under high end-to-end differential privacy guarantee.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Machine Learning is a widely-used method for prediction generation. These predictions are more accurate when the model is trained on a larger dataset. On the other hand, the data is usually divided amongst different entities. For privacy reasons, the training can be done locally and then the model can be safely aggregated amongst the participants. However, if there are only two participants in \textit{Collaborative Learning}, the safe aggregation loses its power since the output of the training already contains much information about the participants. To resolve this issue, they must employ privacy-preserving mechanisms, which inevitably affect the accuracy of the model. In this paper, we model the training process as a two-player game where each player aims to achieve a higher accuracy while preserving its privacy. We introduce the notion of \textit{Price of Privacy}, a novel approach to measure the effect of privacy protection on the accuracy of the model. We develop a theoretical model for different player types, and we either find or prove the existence of a Nash Equilibrium with some assumptions. Moreover, we confirm these assumptions via a Recommendation Systems use case: for a specific learning algorithm, we apply three privacy-preserving mechanisms on two real-world datasets. Finally, as a complementary work for the designed game, we interpolate the relationship between privacy and accuracy for this use case and present three other methods to approximate it in a real-world scenario.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.