国产乱人弄视频免费观看_无码一级毛片免费_全亚洲综合视频在线观看_一级一级A爱片免费视频_无遮挡又黄的免费视频网站_久久久久亚州AV无码专区_亚洲卡5卡6卡7卡2021入口

Serverless computing automates fine-grained resource scaling and simplifies the development and deployment of online services with stateless functions. However, it is still non-trivial for users to allocate appropriate resources due to various function types, dependencies, and input sizes. Misconfiguration of resource allocations leaves functions either under-provisioned or over-provisioned and leads to continuous low resource utilization. This paper presents Freyr, a new resource manager (RM) for serverless platforms that maximizes resource efficiency by dynamically harvesting idle resources from over-provisioned functions to under-provisioned functions. Freyr monitors each function's resource utilization in real-time, detects over-provisioning and under-provisioning, and learns to harvest idle resources safely and accelerates functions efficiently by applying deep reinforcement learning algorithms along with a safeguard mechanism. We have implemented and deployed a Freyr prototype in a 13-node Apache OpenWhisk cluster. Experimental results show that 38.8% of function invocations have idle resources harvested by Freyr, and 39.2% of invocations are accelerated by the harvested resources. Freyr reduces the 99th-percentile function response latency by 32.1% compared to the baseline RMs.

相關內容

泛函(han)

關注 0

Networking · Performer · 規范化的 · 相似度 · 數據集 ·

2022 年 4 月 20 日

A Comprehensive Study of Accelerating IPv6 Deployment

Tianyu Cui,Chang Liu,Gaopeng Gou,Junzheng Shi,Gang Xiong

from arxiv, The paper has been accepted at the IEEE International Performance Computing and Communications Conference (IPCCC 2019)

Since the lack of IPv6 network development, China is currently accelerating IPv6 deployment. In this scenario, traffic and network structure show a huge shift. However, due to the long-term prosperity, we are ignorant of the problems behind such outbreak of traffic and performance improvement events in accelerating deployment. IPv6 development in some regions will still face similar challenges in the future. To contribute to solving this problem, in this paper, we produce a new measurement framework and implement a 5-month passive measurement on the IPv6 network during the accelerating deployment in China. We combine 6 global-scale datasets to form the normal status of IPv6 network, which is against to the accelerating status formed by the passive traffic. Moreover, we compare with the traffic during World IPv6 Day 2011 and Launch 2012 to discuss the common nature of accelerating deployment. Finally, the results indicate that the IPv6 accelerating deployment is often accompanied by an unbalanced network status. It exposes unresolved security issues including the challenge of user privacy and inappropriate access methods. According to the investigation, we point the future IPv6 development after accelerating deployment.

數據并行 · MoDELS · 深度學習框架 · 學成 · Extensibility ·

2022 年 4 月 19 日

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

Jinhui Yuan,Xinqi Li,Cheng Cheng,Juncheng Liu,Ran Guo,Shenghang Cai,Chi Yao,Fei Yang,Xiaodong Yi,Chuan Wu,Haoran Zhang,Jie Zhao

Deep learning frameworks such as TensorFlow and PyTorch provide a productive interface for expressing and training a deep neural network (DNN) model on a single device or using data parallelism. Still, they may not be flexible or efficient enough in training emerging large models on distributed devices, which require more sophisticated parallelism beyond data parallelism. Plugins or wrappers have been developed to strengthen these frameworks for model or pipeline parallelism, but they complicate the usage and implementation of distributed deep learning. Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model. SBP enables much easier programming of data parallelism and model parallelism than existing frameworks, and the actor model provides a succinct runtime mechanism to manage the complex dependencies imposed by resource constraints, data movement and computation in distributed deep learning. We demonstrate the general applicability and efficiency of OneFlow for training various large DNN models with case studies and extensive experiments. The results show that OneFlow outperforms many well-known customized libraries built on top of the state-of-the-art frameworks. The code of OneFlow is available at: //github.com/Oneflow-Inc/oneflow.

相互獨立的 · IST · Networking · 學成 · CASES ·

2022 年 4 月 18 日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Binhang Yuan,Cameron R. Wolfe,Chen Dun,Yuxin Tang,Anastasios Kyrillidis,Christopher M. Jermaine

Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, thus enabling reductions in training time. Distributed learning partitions models and data over many machines, allowing model and dataset sizes beyond the available compute power and memory of a single machine. In practice though, distributed ML is challenging when distribution is mandatory, rather than chosen by the practitioner. In such scenarios, data could unavoidably be separated among workers due to limited memory capacity per worker or even because of data privacy issues. There, existing distributed methods will utterly fail due to dominant transfer costs across workers, or do not even apply. We propose a new approach to distributed fully connected neural network learning, called independent subnet training (IST), to handle these cases. In IST, the original network is decomposed into a set of narrow subnetworks with the same depth. These subnetworks are then trained locally before parameters are exchanged to produce new subnets and the training cycle repeats. Such a naturally "model parallel" approach limits memory usage by storing only a portion of network parameters on each device. Additionally, no requirements exist for sharing data between workers (i.e., subnet training is local and independent) and communication volume and frequency are reduced by decomposing the original network into independent subnets. These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution. We show experimentally that IST results in training times that are much lower than common distributed learning approaches.

邊 · 邊緣計算 · 優化器 · Performer · Storage ·

2022 年 4 月 18 日

Actions at the Edge: Jointly Optimizing the Resources in Multi-access Edge Computing

Yiqin Deng,Xianhao Chen,Guangyu Zhu,Yuguang Fang,Zhigang Chen,Xiaoheng Deng

from arxiv, 7 pages, 2 figures, accepted by IEEE Wireless Communications

Multi-access edge computing (MEC) is an emerging paradigm that pushes resources for sensing, communications, computing, storage and intelligence (SCCSI) to the premises closer to the end users, i.e., the edge, so that they could leverage the nearby rich resources to improve their quality of experience (QoE). Due to the growing emerging applications targeting at intelligentizing life-sustaining cyber-physical systems, this paradigm has become a hot research topic, particularly when MEC is utilized to provide edge intelligence and real-time processing and control. This article is to elaborate the research issues along this line, including basic concepts and performance metrics, killer applications, architectural design, modeling approaches and solutions, and future research directions. It is hoped that this article provides a quick introduction to this fruitful research area particularly for beginning researchers.

可約的 · Wireless Networks · MoDELS · 層 · 資源管理 ·

2022 年 4 月 18 日

Split Learning over Wireless Networks: Parallel Design and Resource Management

Wen Wu,Mushu Li,Kaige Qu,Conghao Zhou, Xuemin, Shen,Weihua Zhuang,Xu Li,Weisen Shi

from arxiv, The paper has been submitted to IEEE Journal on Selected Areas in Communications

Split learning (SL) is a collaborative learning framework, which can train an artificial intelligence (AI) model between a device and an edge server by splitting the AI model into a device-side model and a server-side model at a cut layer. The existing SL approach conducts the training process sequentially across devices, which incurs significant training latency especially when the number of devices is large. In this paper, we design a novel SL scheme to reduce the training latency, named Cluster-based Parallel SL (CPSL) which conducts model training in a "first-parallel-then-sequential" manner. Specifically, the CPSL is to partition devices into several clusters, parallelly train device-side models in each cluster and aggregate them, and then sequentially train the whole AI model across clusters, thereby parallelizing the training process and reducing training latency. Furthermore, we propose a resource management algorithm to minimize the training latency of CPSL considering device heterogeneity and network dynamics in wireless networks. This is achieved by stochastically optimizing the cut layer selection, real-time device clustering, and radio spectrum allocation. The proposed two-timescale algorithm can jointly make the cut layer selection decision in a large timescale and device clustering and radio spectrum allocation decisions in a small timescale. Extensive simulation results on non-independent and identically distributed data demonstrate that the proposed solutions can greatly reduce the training latency as compared with the existing SL benchmarks, while adapting to network dynamics.

可辨認的 · 模型復雜度 · INTERACT · 回合 · 可約的 ·

2022 年 4 月 17 日

Modeling Complex Interactions in a Disrupted Environment: Relational Events in the WTC Response

Scott Leo Renshaw,Selena M. Livas,Miruna G. Petrescu-Prahova,Carter T. Butts

When subjected to a sudden, unanticipated threat, human groups characteristically self-organize to identify the threat, determine potential responses, and act to reduce its impact. Central to this process is the challenge of coordinating information sharing and response activity within a disrupted environment. In this paper, we consider coordination in the context of responses to the 2001 World Trade Center disaster. Using records of communications among 17 organizational units, we examine the mechanisms driving communication dynamics, with an emphasis on the emergence of coordinating roles. We employ relational event models (REMs) to identify the mechanisms shaping communications in each unit, finding a consistent pattern of behavior across units with very different characteristics. Using a simulation-based "knock-out" study, we also probe the importance of different mechanisms for hub formation. Our results suggest that, while preferential attachment and pre-disaster role structure generally contribute to the emergence of hub structure, temporally local conversational norms play a much larger role. We discuss broader implications for the role of microdynamics in driving macroscopic outcomes, and for the emergence of coordination in other settings.

聯邦學習 · 學成 · Extensibility · MoDELS · 結點 ·

2022 年 4 月 16 日

A Distributed and Elastic Aggregation Service for Scalable Federated Learning Systems

Ahmad Khan,Yuze Li,Ali Anwar,Yue Cheng,Thang Hoang,Nathalie Baracaldo,Ali Butt

from arxiv, 10 pages, 14 figures, 1 table

Federated Learning has promised a new approach to resolve the challenges in machine learning by bringing computation to the data. The popularity of the approach has led to rapid progress in the algorithmic aspects and the emergence of systems capable of simulating Federated Learning. State of art systems in Federated Learning support a single node aggregator that is insufficient to train a large corpus of devices or train larger-sized models. As the model size or the number of devices increase the single node aggregator incurs memory and computation burden while performing fusion tasks. It also faces communication bottlenecks when a large number of model updates are sent to a single node. We classify the workload for the aggregator into categories and propose a new aggregation service for handling each load. Our aggregation service is based on a holistic approach that chooses the best solution depending on the model update size and the number of clients. Our system provides a fault-tolerant, robust and efficient aggregation solution utilizing existing parallel and distributed frameworks. Through evaluation, we show the shortcomings of the state of art approaches and how a single solution is not suitable for all aggregation requirements. We also provide a comparison of current frameworks with our system through extensive experiments.

可約的 · 服務器 · 邊 · Continuity · Performer ·

2022 年 4 月 15 日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Howard H. Yang,Zihan Chen,Tony Q. S. Quek

We demonstrate that merely analog transmissions and match filtering can realize the function of an edge server in federated learning (FL). Therefore, a network with massively distributed user equipments (UEs) can achieve large-scale FL without an edge server. We also develop a training algorithm that allows UEs to continuously perform local computing without being interrupted by the global parameter uploading, which exploits the full potential of UEs' processing power. We derive convergence rates for the proposed schemes to quantify their training efficiency. The analyses reveal that when the interference obeys a Gaussian distribution, the proposed algorithm retrieves the convergence rate of a server-based FL. But if the interference distribution is heavy-tailed, then the heavier the tail, the slower the algorithm converges. Nonetheless, the system run time can be largely reduced by enabling computation in parallel with communication, whereas the gain is particularly pronounced when communication latency is high. These findings are corroborated via excessive simulations.

邊 · 邊緣計算 · TOOLS · Networking · Integration ·

2019 年 11 月 7 日

A Survey on Edge Computing Systems and Tools

Fang Liu,Guoming Tang,Youhuizi Li,Zhiping Cai,Xingzhou Zhang,Tongqing Zhou

from arxiv, 24 pages, 21 figures, 4 tables, 87 references

Driven by the visions of Internet of Things and 5G communications, the edge computing systems integrate computing, storage and network resources at the edge of the network to provide computing infrastructure, enabling developers to quickly develop and deploy edge applications. Nowadays the edge computing systems have received widespread attention in both industry and academia. To explore new research opportunities and assist users in selecting suitable edge computing systems for specific applications, this survey paper provides a comprehensive overview of the existing edge computing systems and introduces representative projects. A comparison of open source tools is presented according to their applicability. Finally, we highlight energy efficiency and deep learning optimization of edge computing systems. Open issues for analyzing and designing an edge computing system are also studied in this survey.

Networking · Neural Networks · MoDELS · Performer · 模型性能 ·

2019 年 9 月 8 日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Yu Cheng,Duo Wang,Pan Zhou,Tao Zhang

from arxiv, Published in IEEE Signal Processing Magazine, arXiv version including some recent works

Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.