亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='8tf8y'><strong id='8tf8y'></strong><small id='8tf8y'></small><button id='8tf8y'></button><li id='8tf8y'><noscript id='8tf8y'><big id='8tf8y'></big><dt id='8tf8y'></dt></noscript></li></tr><ol id='8tf8y'><option id='8tf8y'><table id='8tf8y'><blockquote id='8tf8y'><tbody id='8tf8y'></tbody></blockquote></table></option></ol><u id='8tf8y'></u><kbd id='8tf8y'><kbd id='8tf8y'></kbd></kbd>

<code id='8tf8y'><strong id='8tf8y'></strong></code>

<fieldset id='8tf8y'></fieldset>

<span id='8tf8y'></span>

<ins id='8tf8y'></ins>

<acronym id='8tf8y'><em id='8tf8y'></em><td id='8tf8y'><div id='8tf8y'></div></td></acronym><address id='8tf8y'><big id='8tf8y'><big id='8tf8y'></big><legend id='8tf8y'></legend></big></address>

<i id='8tf8y'><div id='8tf8y'><ins id='8tf8y'></ins></div></i>

<i id='8tf8y'></i>

·

可約的 · Performer · 邊 · 推斷 · 邊緣計算 ·

2021 年 8 月 17 日

Rateless Codes for Low-Latency Distributed Inference in Mobile Edge Computing

Anton Frig?rd,Siddhartha Kumar,Eirik Rosnes,Alexandre Graell i Amat

We consider a mobile edge computing scenario where users want to perform a linear inference operation $\boldsymbol{W} \boldsymbol{x}$ on local data $\boldsymbol{x}$ for some network-side matrix $\boldsymbol{W}$. The inference is performed in a distributed fashion over multiple servers at the network edge. For this scenario, we propose a coding scheme that combines a rateless code to provide resiliency against straggling servers--hence reducing the computation latency--and an irregular-repetition code to provide spatial diversity--hence reducing the communication latency. We further derive a lower bound on the total latency--comprising computation latency, communication latency, and decoding latency. The proposed scheme performs remarkably close to the bound and yields significantly lower latency than the scheme based on maximum distance separable codes recently proposed by Zhang and Simeone.

相關內容

可約的

團 · 圖 · MoDELS · FAST · 情景 ·

2021 年 10 月 13 日

Beyond Distributed Subgraph Detection: Induced Subgraphs, Multicolored Problems and Graph Parameters

Janne H. Korhonen,Amir Nikabadi

Subgraph detection has recently been one of the most studied problems in the CONGEST model of distributed computing. In this work, we study the distributed complexity of problems closely related to subgraph detection, mainly focusing on induced subgraph detection. The main line of this work presents lower bounds and parameterized algorithms w.r.t structural parameters of the input graph: -- On general graphs, we give unconditional lower bounds for induced detection of cycles and patterns of treewidth 2 in CONGEST. Moreover, by adapting reductions from centralized parameterized complexity, we prove lower bounds in CONGEST for detecting patterns with a 4-clique, and for induced path detection conditional on the hardness of triangle detection in the congested clique. -- On graphs of bounded degeneracy, we show that induced paths can be detected fast in CONGEST using techniques from parameterized algorithms, while detecting cycles and patterns of treewidth 2 is hard. -- On graphs of bounded vertex cover number, we show that induced subgraph detection is easy in CONGEST for any pattern graph. More specifically, we adapt a centralized parameterized algorithm for a more general maximum common induced subgraph detection problem to the distributed setting. In addition to these induced subgraph detection results, we study various related problems in the CONGEST and congested clique models, including for multicolored versions of subgraph-detection-like problems.

估計/估計量 · SC · Weight · 推斷 · 控制器 ·

2021 年 10 月 12 日

Theory for identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework

Xu Shi,Wang Miao,Mengtong Hu,Eric Tchetgen Tchetgen

Synthetic control methods are commonly used to estimate the treatment effect on a single treated unit in panel data settings. A synthetic control (SC) is a weighted average of control units built to match the treated unit's pre-treatment outcome trajectory, with weights typically estimated by regressing pre-treatment outcomes of the treated unit to those of the control units. However, it has been established that such regression estimators can fail to be consistent. In this paper, we introduce a proximal causal inference framework to formalize identification and inference for both the SC weights and the treatment effect on the treated. We show that control units previously perceived as unusable can be repurposed to consistently estimate the SC weights. We also propose to view the difference in the post-treatment outcomes between the treated unit and the SC as a time series, which opens the door to a rich literature on time-series analysis for treatment effect estimation. We further extend the traditional linear model to accommodate general nonlinear models allowing for binary and count outcomes which are understudied in the SC literature. We illustrate our proposed methods with simulation studies and an application to evaluation of the 1990 German Reunification.

控制器 · 非周期的 · Continuity · 容差 · 正則化項 ·

2021 年 10 月 12 日

Computing semigroups with error control

Matthew J. Colbrook

We develop an algorithm that computes strongly continuous semigroups on infinite-dimensional Hilbert spaces with explicit error control. Given a generator $A$, a time $t>0$, an arbitrary initial vector $u_0$ and an error tolerance $\epsilon>0$, the algorithm computes $\exp(tA)u_0$ with error bounded by $\epsilon$. The algorithm is based on a combination of a regularized functional calculus, suitable contour quadrature rules, and the adaptive computation of resolvents in infinite dimensions. As a particular case, we show that it is possible, even when only allowing pointwise evaluation of coefficients, to compute, with error control, semigroups on the unbounded domain $L^2(\mathbb{R}^d)$ that are generated by partial differential operators with polynomially bounded coefficients of locally bounded total variation. For analytic semigroups (and more general Laplace transform inversion), we provide a quadrature rule whose error decreases like $\exp(-cN/\log(N))$ for $N$ quadrature points, that remains stable as $N\rightarrow\infty$, and which is also suitable for infinite-dimensional operators. Numerical examples are given, including: Schr\"odinger and wave equations on the aperiodic Ammann--Beenker tiling, complex perturbed fractional diffusion equations on $L^2(\mathbb{R})$, and damped Euler--Bernoulli beam equations.

FC · 卡爾曼濾波 · 控制器 · MoDELS · 可辨認的 ·

2021 年 10 月 12 日

Distributed Kalman Filters for Relative Formation Control of Mobile Agents

Martijn van der Marel,Raj Thilak Rajan

from arxiv, In submission

Formation control (FC) of multi-agent plays a critical role in a wide variety of fields. In the absence of absolute positioning, agents in FC systems rely on relative position measurements with respect to their neighbors. In distributed filter design literature, relative observation models are comparatively unexplored, and in FC literature, uncertainty models are rarely considered. In this article, we aim to bridge the gap between these domains, by exploring distributed filters tailored for relative FC of swarms. We propose statistically robust data models for tracking relative positions of agents in a FC network, and subsequently propose optimal Kalman filters for both centralized and distributed scenarios. Our simulations highlight the benefits of these estimators, and we identify future research directions based on our proposed framework.

推斷 · Performer · FAST · tuning · 優化器 ·

2021 年 10 月 12 日

SoftNeuro: Fast Deep Inference using Multi-platform Optimization

Masaki Hilaga,Yasuhiro Kuroda,Hitoshi Matsuo,Tatsuya Kawaguchi,Gabriel Ogawa,Hiroshi Miyake,Yusuke Kozawa

from arxiv, 8 pages, 3 figures

Faster inference of deep learning models is highly demanded on edge devices and even servers, for both financial and environmental reasons. To address this issue, we propose SoftNeuro, a novel, high-performance inference framework with efficient performance tuning. The key idea is to separate algorithmic routines from network layers. Our framework maximizes the inference performance by profiling various routines for each layer and selecting the fastest path. To efficiently find the best path, we propose a routine-selection algorithm based on dynamic programming. Experiments show that the proposed framework achieves both fast inference and efficient tuning.

Neural Networks · Networking · 可約的 · Continuity · 推斷 ·

2021 年 6 月 21 日

A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami,Sehoon Kim,Zhen Dong,Zhewei Yao,Michael W. Mahoney,Kurt Keutzer

from arxiv, Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

圖形處理器 · MoDELS · Networking · Neural Networks · 圖 ·

2021 年 6 月 9 日

Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling

Chuizheng Meng,Sirisha Rambhatla,Yan Liu

from arxiv, To be published in the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 21)

Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.

GPU · Neural Networks · 縮放 · Extensibility · 學成 ·

2020 年 3 月 12 日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Weijie Zhao,Deping Xie,Ronglai Jia,Yulei Qian,Ruiquan Ding,Mingming Sun,Ping Li

Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $10^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.

分布式機器學習 · Machine Learning · 學成 · Storage · 優化器 ·

2019 年 9 月 18 日

Distributed Machine Learning on Mobile Devices: A Survey

Renjie Gu,Shuo Yang,Fan Wu

In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

邊緣計(ji)算

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<form id='6L18L'></form>

<bdo id='mKGOS'><sup id='jOV8d'><div id='pZfwE'><bdo id='bxYwB'></bdo></div></sup></bdo>