91婷婷国产精选国产色-日韩A精品日韩精品无码

Lianmin Zheng,Zhuohan Li,Hao Zhang,Yonghao Zhuang,Zhifeng Chen,Yanping Huang,Yida Wang,Eric P. Xing,Yuanzhong Xu,Danyang Zhuo,Joseph E. Gonzalez,Ion Stoica

from arxiv, OSDI 2022

Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations. They do not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive efficient parallel execution plans at each parallelism level. Alpa implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans. Alpa's source code is publicly available at //github.com/alpa-projects/alpa

相關內容

Learning

關注 0

MoDELS · Learning · Extensibility · 可約的 · 聯邦學習 ·

2022 年 7 月 20 日

FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning

Yuanhao Xiong,Ruochen Wang,Minhao Cheng,Felix Yu,Cho-Jui Hsieh

Federated learning~(FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints. Existing iterative model averaging based FL algorithms require a large number of communication rounds to obtain a well-performed model due to extremely unbalanced and non-i.i.d data partitioning among different clients. Thus, we propose FedDM to build the global training objective from multiple local surrogate functions, which enables the server to gain a more global view of the loss landscape. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data through distribution matching. FedDM reduces communication rounds and improves model quality by transmitting more informative and smaller synthesized data compared with unwieldy model weights. We conduct extensive experiments on three image classification datasets, and results show that our method can outperform other FL counterparts in terms of efficiency and model performance. Moreover, we demonstrate that FedDM can be adapted to preserve differential privacy with Gaussian mechanism and train a better model under the same privacy budget.

Learning · 相互獨立的 · 有偏 · 操作 · MoDELS ·

2022 年 7 月 19 日

Error-in-variables modelling for operator learning

Ravi G. Patel,Indu Manickam,Myoungkyu Lee,Mamikon Gulian

from arxiv, 23 pages, 10 figures

Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the mapping between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator.

簇 · Performer · Learning · 層次聚類 · ACM International Conference on Interactive Surfaces and Spaces ·

2022 年 7 月 19 日

Over-the-Air Federated Edge Learning with Hierarchical Clustering

Ozan Aygün,Mohammad Kazemi,Deniz Gündüz,Tolga M. Duman

from arxiv, 30 pages, 6 figures

We examine federated learning (FL) with over-the-air (OTA) aggregation, where mobile users (MUs) aim to reach a consensus on a global model with the help of a parameter server (PS) that aggregates the local gradients. In OTA FL, MUs train their models using local data at every training round and transmit their gradients simultaneously using the same frequency band in an uncoded fashion. Based on the received signal of the superposed gradients, the PS performs a global model update. While the OTA FL has a significantly decreased communication cost, it is susceptible to adverse channel effects and noise. Employing multiple antennas at the receiver side can reduce these effects, yet the path-loss is still a limiting factor for users located far away from the PS. To ameliorate this issue, in this paper, we propose a wireless-based hierarchical FL scheme that uses intermediate servers (ISs) to form clusters at the areas where the MUs are more densely located. Our scheme utilizes OTA cluster aggregations for the communication of the MUs with their corresponding IS, and OTA global aggregations from the ISs to the PS. We present a convergence analysis for the proposed algorithm, and show through numerical evaluations of the derived analytical expressions and experimental results that utilizing ISs results in a faster convergence and a better performance than the OTA FL alone while using less transmit power. We also validate the results on the performance using different number of cluster iterations with different datasets and data distributions. We conclude that the best choice of cluster aggregations depends on the data distribution among the MUs and the clusters.

Learning · 聯邦學習 · MoDELS · Storage · 服務器 ·

2022 年 7 月 17 日

Federated Learning and catastrophic forgetting in pervasive computing: demonstration in HAR domain

Anastasiia Usmanova,Fran?ois Portet,Philippe Lalanda,German Vega

from arxiv, A. Usmanova, F. Portet, P. Lalanda and G. Vega, "Federated Learning and catastrophic forgetting in pervasive computing: demonstration in HAR domain," 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 2022, pp. 310-315

Federated Learning has been introduced as a new machine learning paradigm enhancing the use of local devices. At a server level, FL regularly aggregates models learned locally on distributed clients to obtain a more general model. In this way, no private data is sent over the network, and the communication cost is reduced. However, current solutions rely on the availability of large amounts of stored data at the client side in order to fine-tune the models sent by the server. Such setting is not realistic in mobile pervasive computing where data storage must be kept low and data characteristic (distribution) can change dramatically. To account for this variability, a solution is to use the data regularly collected by the client to progressively adapt the received model. But such naive approach exposes clients to the well-known problem of catastrophic forgetting. The purpose of this paper is to demonstrate this problem in the mobile human activity recognition context on smartphones.

state-of-the-art · 序列化 · 優化器 · Performer · 容差 ·

2022 年 7 月 17 日

Parallelizing Explicit and Implicit Extrapolation Methods for Ordinary Differential Equations

Utkarsh,Chris Elrod,Yingbo Ma,Christopher Rackauckas

from arxiv, 6 figures

Numerically solving ordinary differential equations (ODEs) is a naturally serial process and as a result the vast majority of ODE solver software are serial. In this manuscript we developed a set of parallelized ODE solvers using extrapolation methods which exploit "parallelism within the method" so that arbitrary user ODEs can be parallelized. We describe the specific choices made in the implementation of the explicit and implicit extrapolation methods which allow for generating low overhead static schedules to then exploit with optimized multi-threaded implementations. We demonstrate that while the multi-threading gives a noticeable acceleration on both explicit and implicit problems, the explicit parallel extrapolation methods gave no significant improvement over state-of-the-art even with a multi-threading advantage against current optimized high order Runge-Kutta tableaus. However, we demonstrate that the implicit parallel extrapolation methods are able to achieve state-of-the-art performance (2x-4x) on standard multicore x86 CPUs for systems of $<200$ stiff ODEs solved at low tolerance, a typical setup for a vast majority of users of high level language equation solver suites. The resulting method is distributed as the first widely available open source software for within-method parallel acceleration targeting typical modest compute architectures.

Learning · 聯邦學習 · MoDELS · 泛化理論 · 模型評估 ·

2022 年 7 月 15 日

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

Alberto Bietti,Chen-Yu Wei,Miroslav Dudík,John Langford,Zhiwei Steven Wu

from arxiv, ICML

Large-scale machine learning systems often involve data distributed across a collection of users. Federated learning algorithms leverage this structure by communicating model updates to a central server, rather than entire datasets. In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint) differential privacy. While learning a private global model induces a cost of privacy, local learning is perfectly private. We provide generalization guarantees showing that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy. We illustrate our theoretical results with experiments on synthetic and real-world datasets.

Learning · Performer · MoDELS · 聯邦學習 · 散度 ·

2022 年 7 月 15 日

Communication-Efficient Diffusion Strategy for Performance Improvement of Federated Learning with Non-IID Data

Seyoung Ahn,Soohyeong Kim,Yongseok Kwon,Joohan Park,Jiseung Youn,Sunghyun Cho

from arxiv, 18 pages, 6 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Federated learning (FL) is a novel learning paradigm that addresses the privacy leakage challenge of centralized learning. However, in FL, users with non-independent and identically distributed (non-IID) characteristics can deteriorate the performance of the global model. Specifically, the global model suffers from the weight divergence challenge owing to non-IID data. To address the aforementioned challenge, we propose a novel diffusion strategy of the machine learning (ML) model (FedDif) to maximize the FL performance with non-IID data. In FedDif, users spread local models to neighboring users over D2D communications. FedDif enables the local model to experience different distributions before parameter aggregation. Furthermore, we theoretically demonstrate that FedDif can circumvent the weight divergence challenge. On the theoretical basis, we propose the communication-efficient diffusion strategy of the ML model, which can determine the trade-off between the learning performance and communication cost based on auction theory. The performance evaluation results show that FedDif improves the test accuracy of the global model by 11% compared to the baseline FL with non-IID settings. Moreover, FedDif improves communication efficiency in perspective of the number of transmitted sub-frames and models by 2.77 folds than the latest methods

Learning · 邊 · 聯邦學習 · 可約的 · Networking ·

2022 年 7 月 14 日

FedFly: Towards Migration in Edge-based Distributed Federated Learning

Rehmat Ullah,Di Wu,Paul Harvey,Peter Kilpatrick,Ivor Spence,Blesson Varghese

from arxiv, 7 pages, 6 figures

Federated learning (FL) is a privacy-preserving distributed machine learning technique that trains models while keeping all the original data generated on devices locally. Since devices may be resource constrained, offloading can be used to improve FL performance by transferring computational workload from devices to edge servers. However, due to mobility, devices participating in FL may leave the network during training and need to connect to a different edge server. This is challenging because the offloaded computations from edge server need to be migrated. In line with this assertion, we present FedFly, which is, to the best of our knowledge, the first work to migrate a deep neural network (DNN) when devices move between edge servers during FL training. Our empirical results on the CIFAR10 dataset, with both balanced and imbalanced data distribution, support our claims that FedFly can reduce training time by up to 33% when a device moves after 50% of the training is completed, and by up to 45% when 90% of the training is completed when compared to state-of-the-art offloading approach in FL. FedFly has negligible overhead of up to two seconds and does not compromise accuracy. Finally, we highlight a number of open research issues for further investigation. FedFly can be downloaded from //github.com/qub-blesson/FedFly.

Neural Networks · Networking · 貪心 · Extensibility · 優化器 ·

2022 年 7 月 14 日

Greedy Training Algorithms for Neural Networks and Applications to PDEs

Jonathan W. Siegel,Qingguo Hong,Xianlin Jin,Wenrui Hao,Jinchao Xu

from arxiv, has been merged with arXiv:2104.02903

Recently, neural networks have been widely applied for solving partial differential equations (PDEs). However, with current training algorithms the numerical convergence of neural networks when solving PDEs has not been empirically observed. The primary difficulty lies in solving the highly non-convex optimization problems resulting from the neural network discretization. Theoretically analyzing the optimization process presents significant difficulties and empirical experiments require extensive hyperparameter tuning to achieve acceptable results. In order to conquer this challenge, we develop a novel greedy training algorithm for shallow neural networks in this paper. We also analyze the resulting method and obtain a priori error bounds when solving PDEs from the function class defined by shallow networks. This rigorously establishes the convergence of the method as the network size increases. Finally, we test the algorithm on several benchmark examples, including high dimensional PDEs, to confirm the theoretical convergence rate and to establish its efficiency and robustness. An advantage of this method is its straightforward applicability to high-order equations on general domains.

ML · Machine Learning · 學成 · 設計 · Performer ·

2021 年 2 月 16 日

A Survey of Machine Learning for Computer Architecture and Systems

Nan Wu,Yuan Xie

It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models. Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed. This embraces a twofold meaning: the improvement of designers' productivity, and the completion of the virtuous cycle. In this paper, we present a comprehensive review of work that applies ML for system design, which can be grouped into two major categories, ML-based modelling that involves predictions of performance metrics or some other criteria of interest, and ML-based design methodology that directly leverages ML as the design tool. For ML-based modelling, we discuss existing studies based on their target level of system, ranging from the circuit level to the architecture/system level. For ML-based design methodology, we follow a bottom-up path to review current work, with a scope of (micro-)architecture design (memory, branch prediction, NoC), coordination between architecture/system and workload (resource allocation and management, data center management, and security), compiler, and design automation. We further provide a future vision of opportunities and potential directions, and envision that applying ML for computer architecture and systems would thrive in the community.