国产成人精品三级在线,我和子的性关系过程在线观看,国产激情视频在线播放,扒开美女双腿桶出白浆,国产精品毛片一区二区久久

There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy. For Lipschitz functions, the Lipschitz constant serves as a trivial clipping threshold with zero bias. However, Lipschitzness does not hold in many practical settings; moreover, verifying it and computing the Lipschitz constant is hard. Thus, the choice of the clipping threshold is non-trivial and requires a lot of tuning in practice. In this paper, we provide the first convergence result for private FL on smooth \textit{convex} objectives \textit{for a general clipping threshold} -- \textit{without assuming Lipschitzness}. We also look at a simpler alternative to clipping (for bounding sensitivity) which is \textit{normalization} -- where we use only a scaled version of the unit vector along the client updates, completely discarding the magnitude information. {The resulting normalization-based private FL algorithm is theoretically shown to have better convergence than its clipping-based counterpart on smooth convex functions. We corroborate our theory with synthetic experiments as well as experiments on benchmarking datasets.

相關內容

Lipschitz

關注 0

Learning · 聯邦學習 · 數據集 · 講稿 · 張成子空間 ·

2022 年 6 月 7 日

Federated Hetero-Task Learning

Liuyi Yao,Dawei Gao,Zhen Wang,Yuexiang Xie,Weirui Kuang,Daoyuan Chen,Haohui Wang,Chenhe Dong,Bolin Ding,Yaliang Li

To investigate the heterogeneity of federated learning in real-world scenarios, we generalize the classical federated learning to federated hetero-task learning, which emphasizes the inconsistency across the participants in federated learning in terms of both data distribution and learning tasks. We also present B-FHTL, a federated hetero-task learning benchmark consisted of simulation dataset, FL protocols and a unified evaluation mechanism. B-FHTL dataset contains three well-designed federated learning tasks with increasing heterogeneity. Each task simulates the clients with different data distributions and learning tasks. To ensure fair comparison among different FL algorithms, B-FHTL builds in a full suite of FL protocols by providing high-level APIs to avoid privacy leakage, and presets most common evaluation metrics spanning across different learning tasks, such as regression, classification, text generation and etc. Furthermore, we compare the FL algorithms in fields of federated multi-task learning, federated personalization and federated meta learning within B-FHTL, and highlight the influence of heterogeneity and difficulties of federated hetero-task learning. Our benchmark, including the federated dataset, protocols, the evaluation mechanism and the preliminary experiment, is open-sourced at //github.com/alibaba/FederatedScope/tree/contest/v1.0.

Analysis · 有偏 · Learning · 秩 · 可約的 ·

2022 年 6 月 7 日

An Analysis of Selection Bias Issue for Online Advertising

Shinya Suzumura,Hitoshi Abe

In online advertising, a set of potential advertisements can be ranked by a certain auction system where usually the top-1 advertisement would be selected and displayed at an advertising space. In this paper, we show a selection bias issue that is present in an auction system. We analyze that the selection bias destroy truthfulness of the auction, which implies that the buyers (advertisers) on the auction can not maximize their profits. Although selection bias is well known in the field of statistics and there are lot of studies for it, our main contribution is to combine the theoretical analysis of the bias with the auction mechanism. In our experiment using online A/B testing, we evaluate the selection bias on an auction system whose ranking score is the function of predicted CTR (click through rate) of advertisement. The experiment showed that the selection bias is drastically reduced by using a multi-task learning which learns the data for all advertisements.

Learning · 簇 · Analysis · 聯邦學習 · MoDELS ·

2022 年 6 月 7 日

On the Convergence of Clustered Federated Learning

Jie Ma,Guodong Long,Tianyi Zhou,Jing Jiang,Chengqi Zhang

from arxiv, draft

Knowledge sharing and model personalization are essential components to tackle the non-IID challenge in federated learning (FL). Most existing FL methods focus on two extremes: 1) to learn a shared model to serve all clients with non-IID data, and 2) to learn personalized models for each client, namely personalized FL. There is a trade-off solution, namely clustered FL or cluster-wise personalized FL, which aims to cluster similar clients into one cluster, and then learn a shared model for all clients within a cluster. This paper is to revisit the research of clustered FL by formulating them into a bi-level optimization framework that could unify existing methods. We propose a new theoretical analysis framework to prove the convergence by considering the clusterability among clients. In addition, we embody this framework in an algorithm, named Weighted Clustered Federated Learning (WeCFL). Empirical analysis verifies the theoretical results and demonstrates the effectiveness of the proposed WeCFL under the proposed cluster-wise non-IID settings.

Shuffle · Learning · 相互獨立的 · MoDELS · 子采樣 ·

2022 年 6 月 7 日

Shuffled Check-in: Privacy Amplification towards Practical Distributed Learning

Seng Pei Liew,Satoshi Hasegawa,Tsubasa Takahashi

from arxiv, 16 pages, 4 figures

Recent studies of distributed computation with formal privacy guarantees, such as differentially private (DP) federated learning, leverage random sampling of clients in each round (privacy amplification by subsampling) to achieve satisfactory levels of privacy. Achieving this however requires strong assumptions which may not hold in practice, including precise and uniform subsampling of clients, and a highly trusted aggregator to process clients' data. In this paper, we explore a more practical protocol, shuffled check-in, to resolve the aforementioned issues. The protocol relies on client making independent and random decision to participate in the computation, freeing the requirement of server-initiated subsampling, and enabling robust modelling of client dropouts. Moreover, a weaker trust model known as the shuffle model is employed instead of using a trusted aggregator. To this end, we introduce new tools to characterize the R\'enyi differential privacy (RDP) of shuffled check-in. We show that our new techniques improve at least three times in privacy guarantee over those using approximate DP's strong composition at various parameter regimes. Furthermore, we provide a numerical approach to track the privacy of generic shuffled check-in mechanism including distributed stochastic gradient descent (SGD) with Gaussian mechanism. To the best of our knowledge, this is also the first evaluation of Gaussian mechanism within the local/shuffle model under the distributed setting in the literature, which can be of independent interest.

隨機梯度下降 · 樣例 · 損失 · Learning · 類別 ·

2022 年 6 月 7 日

Per-Instance Privacy Accounting for Differentially Private Stochastic Gradient Descent

Da Yu,Gautam Kamath,Janardhan Kulkarni,Tie-Yan Liu,Jian Yin,Huishuai Zhang

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose an efficient algorithm to compute per-instance privacy guarantees for individual examples when running DP-SGD. We use our algorithm to investigate per-instance privacy losses across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bounds. We further discover that the loss and the privacy loss on an example are well-correlated. This implies groups that are underserved in terms of model utility are simultaneously underserved in terms of privacy loss. For example, on CIFAR-10, the average $\epsilon$ of the class with the highest loss (Cat) is 32% higher than that of the class with the lowest loss (Ship). We also run membership inference attacks to show this reflects disparate empirical privacy risks.

Learning · 泛化理論 · 泛化誤差 · 分解的 · 支持向量機 ·

2022 年 6 月 6 日

Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

Milad Sefidgaran,Romain Chor,Abdellatif Zaidi

from arxiv, 24 pages

In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. Specifically, there are $K$ clients whose individually chosen models are aggregated by a central server. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed, and leverage the fact that small changes in each local model change the aggregated model by a factor of only $1/K$. Adopting a recently proposed approach by Sefidgaran et al., and extending it suitably to the distributed setting, this enables smaller rate-distortion terms which are shown to translate into tighter generalization bounds. The bounds are then applied to the distributed support vector machines (SVM), suggesting that the generalization error of the distributed setting decays faster than that of the centralized one with a factor of $\mathcal{O}(\log(K)/\sqrt{K})$. This finding is validated also experimentally. A similar conclusion is obtained for a multiple-round federated learning setup where each client uses stochastic gradient Langevin dynamics (SGLD).

Analysis · 非凸 · SGD · 樣本 · 平滑 ·

2022 年 6 月 5 日

Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

Alexander Tyurin,Lukang Sun,Konstantin Burlachenko,Peter Richtárik

from arxiv, 25 pages, 6 figures

We revisit the classical problem of finding an approximately stationary point of the average of $n$ smooth and possibly nonconvex functions. The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807.01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008.10898), for example, where $\varepsilon$ is the error tolerance. However, i) the big-$\mathcal{O}$ notation hides crucial dependencies on the smoothness constants associated with the functions, and ii) the rates and theory in these methods assume simplistic sampling mechanisms that do not offer any flexibility. In this work we remedy the situation. First, we generalize the $\small\sf\color{green}{PAGE}$ algorithm so that it can provably work with virtually any (unbiased) sampling mechanism. This is particularly useful in federated learning, as it allows us to construct and better understand the impact of various combinations of client and data sampling strategies. Second, our analysis is sharper as we make explicit use of certain novel inequalities that capture the intricate interplay between the smoothness constants and the sampling procedure. Indeed, our analysis is better even for the simple sampling procedure analyzed in the $\small\sf\color{green}{PAGE}$ paper. However, this already improved bound can be further sharpened by a different sampling scheme which we propose. In summary, we provide the most general and most accurate analysis of optimal SGD in the smooth nonconvex regime. Finally, our theoretical findings are supposed with carefully designed experiments.

Learning · 正則化項 · 同分布的 · Performer · 非凸 ·

2022 年 6 月 4 日

A New Look and Convergence Rate of Federated Multi-Task Learning with Laplacian Regularization

Canh T. Dinh,Tung T. Vu,Nguyen H. Tran,Minh N. Dao,Hongyu Zhang

Non-Independent and Identically Distributed (non- IID) data distribution among clients is considered as the key factor that degrades the performance of federated learning (FL). Several approaches to handle non-IID data such as personalized FL and federated multi-task learning (FMTL) are of great interest to research communities. In this work, first, we formulate the FMTL problem using Laplacian regularization to explicitly leverage the relationships among the models of clients for multi-task learning. Then, we introduce a new view of the FMTL problem, which in the first time shows that the formulated FMTL problem can be used for conventional FL and personalized FL. We also propose two algorithms FedU and dFedU to solve the formulated FMTL problem in communication-centralized and decentralized schemes, respectively. Theoretically, we prove that the convergence rates of both algorithms achieve linear speedup for strongly convex and sublinear speedup of order 1/2 for nonconvex objectives. Experimentally, we show that our algorithms outperform the algorithm FedAvg, FedProx, SCAFFOLD, and AFL in FL settings, MOCHA in FMTL settings, as well as pFedMe and Per-FedAvg in personalized FL settings.

Learning · 穩健性 · 泛化理論 · 聯邦學習 · 優化器 ·

2022 年 6 月 3 日

On the Generalization of Wasserstein Robust Federated Learning

Tung-Anh Nguyen,Tuan Dung Nguyen,Long Tan Le,Canh T. Dinh,Nguyen H. Tran

In federated learning, participating clients typically possess non-i.i.d. data, posing a significant challenge to generalization to unseen distributions. To address this, we propose a Wasserstein distributionally robust optimization scheme called WAFL. Leveraging its duality, we frame WAFL as an empirical surrogate risk minimization problem, and solve it using a local SGD-based algorithm with convergence guarantees. We show that the robustness of WAFL is more general than related approaches, and the generalization bound is robust to all adversarial distributions inside the Wasserstein ball (ambiguity set). Since the center location and radius of the Wasserstein ball can be suitably modified, WAFL shows its applicability not only in robustness but also in domain adaptation. Through empirical evaluation, we demonstrate that WAFL generalizes better than the vanilla FedAvg in non-i.i.d. settings, and is more robust than other related methods in distribution shift settings. Further, using benchmark datasets we show that WAFL is capable of generalizing to unseen target domains.

非凸 · CC · 優化器 · 正則化項 · Learning ·

2022 年 6 月 3 日

A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization

Ziyi Chen,Bhavya Kailkhura,Yi Zhou

from arxiv, 20 pages, 1 figure, 1 table

Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity $\mathcal{O}(\kappa^{3.5}\epsilon^{-2})$ over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a {\L}ojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm.