亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<dir id='ej4dw'><del id='ej4dw'><del id='ej4dw'></del><pre id='ej4dw'><pre id='ej4dw'><option id='ej4dw'><address id='ej4dw'></address><bdo id='ej4dw'><tr id='ej4dw'><acronym id='ej4dw'><pre id='ej4dw'></pre></acronym><div id='ej4dw'></div></tr></bdo></option></pre><small id='ej4dw'><address id='ej4dw'><u id='ej4dw'><legend id='ej4dw'><option id='ej4dw'><abbr id='ej4dw'></abbr><li id='ej4dw'><pre id='ej4dw'></pre></li></option></legend><select id='ej4dw'></select></u></address></small></pre></del><sup id='ej4dw'></sup><blockquote id='ej4dw'><dt id='ej4dw'></dt></blockquote><blockquote id='ej4dw'></blockquote></dir><tt id='ej4dw'></tt><u id='ej4dw'><tt id='ej4dw'><form id='ej4dw'></form></tt><td id='ej4dw'><dt id='ej4dw'></dt></td></u>

<code id='ej4dw'><i id='ej4dw'><q id='ej4dw'><legend id='ej4dw'><pre id='ej4dw'><style id='ej4dw'><acronym id='ej4dw'><i id='ej4dw'><form id='ej4dw'><option id='ej4dw'><center id='ej4dw'></center></option></form></i></acronym></style><tt id='ej4dw'></tt></pre></legend></q></i></code><center id='ej4dw'></center>

<dd id='ej4dw'></dd>

<style id='ej4dw'></style><sub id='ej4dw'><dfn id='ej4dw'><abbr id='ej4dw'><big id='ej4dw'><bdo id='ej4dw'></bdo></big></abbr></dfn></sub>_{<dir id='ej4dw'></dir>}

·

GPU · Extensibility · 學成 · 可約的 · 深度學習 ·

2021 年 12 月 16 日

TENSILE: A Tensor granularity dynamic GPU memory scheduling method towards multiple dynamic workloads system

Kaixin Zhang,Hongzhi Wang,Tongxin Li,Han Hu,Songling Zou,Jiye Qiu

Recently, deep learning has been an area of intense research. However, as a kind of computing-intensive task, deep learning highly relies on the scale of GPU memory, which is usually prohibitive and scarce. Although there are some extensive works have been proposed for dynamic GPU memory management, they are hard to be applied to systems with multiple dynamic workloads, such as in-database machine learning systems. In this paper, we demonstrated TENSILE, a method of managing GPU memory in tensor granularity to reduce the GPU memory peak, considering the multiple dynamic workloads. TENSILE tackled the cold-starting and across-iteration scheduling problem existing in previous works. We implement TENSILE on a deep learning framework built by ourselves and evaluated its performance. The experiment results show that TENSILE can save more GPU memory with less extra time overhead than prior works in both single and multiple dynamic workloads scenarios.

相關內容

GPU

奇異的 · 學成 · Performance · 深度學習 · 情景 ·

2022 年 2 月 21 日

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Dharma Shukla,Muthian Sivathanu,Srinidhi Viswanatha,Bhargav Gulavani,Rimma Nehme,Amey Agrawal,Chen Chen,Nipun Kwatra,Ramachandran Ramjee,Pankaj Sharma,Atul Katiyar,Vipul Modi,Vaibhav Sharma,Abhishek Singh,Shreshth Singhal,Kaustubh Welankar,Lu Xun,Ravi Anupindi,Karthik Elangovan,Hasibur Rahman,Zhou Lin,Rahul Seetharaman,Cheng Xu,Eddie Ailijiang,Suresh Krishnappa,Mark Russinovich

from arxiv, Revision: Fixed some typos

Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs). All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism).

Performer · Networking · 6G · Networks · 學成 ·

2022 年 2 月 18 日

Toward a Smart Resource Allocation Policy via Artificial Intelligence in 6G Networks: Centralized or Decentralized?

Ali Nouruzi,Atefeh Rezaei,Ata Khalili,Nader Mokari,Mohammad Reza Javan,Eduard A. Jorswieck,Halim Yanikomeroglu

from arxiv, Submitted to IEEE for possible publications

In this paper, we design a new smart softwaredefined radio access network (RAN) architecture with important properties like flexibility and traffic awareness for sixth generation (6G) wireless networks. In particular, we consider a hierarchical resource allocation framework for the proposed smart soft-RAN model, where the software-defined network (SDN) controller is the first and foremost layer of the framework. This unit dynamically monitors the network to select a network operation type on the basis of distributed or centralized resource allocation architectures to perform decision-making intelligently. In this paper, our aim is to make the network more scalable and more flexible in terms of achievable data rate, overhead, and complexity indicators. To this end, we introduce a new metric, throughput overhead complexity (TOC), for the proposed machine learning-based algorithm, which makes a trade-off between these performance indicators. In particular, the decision making based on TOC is solved via deep reinforcement learning (DRL), which determines an appropriate resource allocation policy. Furthermore, for the selected algorithm, we employ the soft actor-critic method, which is more accurate, scalable, and robust than other learning methods. Simulation results demonstrate that the proposed smart network achieves better performance in terms of TOC compared to fixed centralized or distributed resource management schemes that lack dynamism. Moreover, our proposed algorithm outperforms conventional learning methods employed in other state-of-the-art network designs.

簇 · 縮放 · 推斷 · GPU · 可約的 ·

2022 年 2 月 16 日

Aryl: An Elastic Cluster Scheduler for Deep Learning

Jiamin Li,Hong Xu,Yibo Zhu,Zherui Liu,Chuanxiong Guo,Cong Wang

Companies build separate training and inference GPU clusters for deep learning, and use separate schedulers to manage them. This leads to problems for both training and inference: inference clusters have low GPU utilization when the traffic load is low; training jobs often experience long queueing time due to lack of resources. We introduce Aryl, a new cluster scheduler to address these problems. Aryl introduces capacity loaning to loan idle inference GPU servers for training jobs. It further exploits elastic scaling that scales a training job's GPU allocation to better utilize loaned resources. Capacity loaning and elastic scaling create new challenges to cluster management. When the loaned servers need to be returned, we need to minimize the number of job preemptions; when more GPUs become available, we need to allocate them to elastic jobs and minimize the job completion time (JCT). Aryl addresses these combinatorial problems using principled heuristics. It introduces the notion of server preemption cost which it greedily reduces during server reclaiming. It further relies on the JCT reduction value defined for each additional worker for an elastic job to solve the scheduling problem as a multiple-choice knapsack problem. Prototype implementation on a 64-GPU testbed and large-scale simulation with 15-day traces of over 50,000 production jobs show that Aryl brings 1.53x and 1.50x reductions in average queuing time and JCT, and improves cluster usage by up to 26.9% over the cluster scheduler without capacity loaning or elastic scaling.

奇異的 · 學成 · Performance · 深度學習 · 情景 ·

2022 年 2 月 16 日

Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads

Dharma Shukla,Muthian Sivathanu,Srinidhi Viswanatha,Bhargav Gulavani,Rimma Nehme,Amey Agrawal,Chen Chen,Nipun Kwatra,Ramachandran Ramjee,Pankaj Sharma,Atul Katiyar,Vipul Modi,Vaibhav Sharma,Abhishek Singh,Shreshth Singhal,Kaustubh Welankar,Lu Xun,Ravi Anupindi,Karthik Elangovan,Hasibur Rahman,Zhou Lin,Rahul Seetharaman,Cheng Xu,Eddie Ailijiang,Suresh Krishnappa,Mark Russinovich

Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs). All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism).

Networking · 圖卷積神經網絡/圖卷積網絡 · 圖 · 圖卷積 · MoDELS ·

2021 年 2 月 15 日

Network of Tensor Time Series

Baoyu Jing,Hanghang Tong,Yada Zhu

from arxiv, Accepted by WWW'2021

Co-evolving time series appears in a multitude of applications such as environmental monitoring, financial analysis, and smart transportation. This paper aims to address the following challenges, including (C1) how to incorporate explicit relationship networks of the time series; (C2) how to model the implicit relationship of the temporal dynamics. We propose a novel model called Network of Tensor Time Series, which is comprised of two modules, including Tensor Graph Convolutional Network (TGCN) and Tensor Recurrent Neural Network (TRNN). TGCN tackles the first challenge by generalizing Graph Convolutional Network (GCN) for flat graphs to tensor graphs, which captures the synergy between multiple graphs associated with the tensors. TRNN leverages tensor decomposition to model the implicit relationships among co-evolving time series. The experimental results on five real-world datasets demonstrate the efficacy of the proposed method.

Extensibility · 多任務學習 · 學成 · 優化器 · Networking ·

2020 年 9 月 16 日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Simon Vandenhende,Stamatios Georgoulis,Wouter Van Gansbeke,Marc Proesmans,Dengxin Dai,Luc Van Gool

from arxiv, Code is available: //github.com/SimonVandenhende/Multi-Task-Learning-PyTorch

With the advent of deep learning, many dense prediction tasks, i.e. tasks that produce pixel-level predictions, have seen significant performance improvements. The typical approach is to learn these tasks in isolation, that is, a separate neural network is trained for each individual task. Yet, recent multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint, by jointly tackling multiple tasks through a learned shared representation. In this survey, we provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision, explicitly emphasizing on dense prediction tasks. Our contributions concern the following. First, we consider MTL from a network architecture point-of-view. We include an extensive overview and discuss the advantages/disadvantages of recent popular MTL models. Second, we examine various optimization methods to tackle the joint learning of multiple tasks. We summarize the qualitative elements of these works and explore their commonalities and differences. Finally, we provide an extensive experimental evaluation across a variety of dense prediction benchmarks to examine the pros and cons of the different methods, including both architectural and optimization based strategies.

AdderNet · Neural Networks · Networking · 卷積 · 模型評估 ·

2019 年 12 月 31 日

AdderNet: Do We Really Need Multiplications in Deep Learning?

Hanting Chen,Yunhe Wang,Chunjing Xu,Boxin Shi,Chao Xu,Qi Tian,Chang Xu

Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

entity · 圖 · 知識圖譜 · 自動問答 · MoDELS ·

2019 年 10 月 15 日

Efficiently Embedding Dynamic Knowledge Graphs

Tianxing Wu,Arijit Khan,Huan Gao,Cheng Li

from arxiv, 14 pages

Knowledge graph (KG) embedding encodes the entities and relations from a KG into low-dimensional vector spaces to support various applications such as KG completion, question answering, and recommender systems. In real world, knowledge graphs (KGs) are dynamic and evolve over time with addition or deletion of triples. However, most existing models focus on embedding static KGs while neglecting dynamics. To adapt to the changes in a KG, these models need to be re-trained on the whole KG with a high time cost. In this paper, to tackle the aforementioned problem, we propose a new context-aware Dynamic Knowledge Graph Embedding (DKGE) method which supports the embedding learning in an online fashion. DKGE introduces two different representations (i.e., knowledge embedding and contextual element embedding) for each entity and each relation, in the joint modeling of entities and relations as well as their contexts, by employing two attentive graph convolutional networks, a gate strategy, and translation operations. This effectively helps limit the impacts of a KG update in certain regions, not in the entire graph, so that DKGE can rapidly acquire the updated KG embedding by a proposed online learning algorithm. Furthermore, DKGE can also learn KG embedding from scratch. Experiments on the tasks of link prediction and question answering in a dynamic environment demonstrate the effectiveness and efficiency of DKGE.

學成 · 強化學習 · 中央處理器 (CPU) · GPU · 訓練樣本 ·

2018 年 10 月 24 日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Jacky Liang,Viktor Makoviychuk,Ankur Handa,Nuttapong Chentanez,Miles Macklin,Dieter Fox

from arxiv, Accepted and to appear at the Conference on Robot Learning (CoRL) 2018

Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.

contrastive · 優化器 · Processing（編程語言） · 無監督 · 同質 ·

2018 年 8 月 2 日

Geometry-Based Multiple Camera Head Detection in Dense Crowds

Nicola Pellicanò,Emanuel Aldea,Sylvie Le Hégarat-Mascle

from arxiv, Proceedings of the 28th British Machine Vision Conference (BMVC) - 5th Activity Monitoring by Multiple Distributed Sensing Workshop, 2017

This paper addresses the problem of head detection in crowded environments. Our detection is based entirely on the geometric consistency across cameras with overlapping fields of view, and no additional learning process is required. We propose a fully unsupervised method for inferring scene and camera geometry, in contrast to existing algorithms which require specific calibration procedures. Moreover, we avoid relying on the presence of body parts other than heads or on background subtraction, which have limited effectiveness under heavy clutter. We cast the head detection problem as a stereo MRF-based optimization of a dense pedestrian height map, and we introduce a constraint which aligns the height gradient according to the vertical vanishing point direction. We validate the method in an outdoor setting with varying pedestrian density levels. With only three views, our approach is able to detect simultaneously tens of heavily occluded pedestrians across a large, homogeneous area.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='YnO50'></tfoot>

<legend id='ze8W1'><style id='c3WDn'><dir id='31Upx'><q id='lygsx'></q></dir></style></legend>

<i id='FPV03'><tr id='2whd3'><dt id='zWjSq'><q id='Z5zGa'><span id='vXsCt'><b id='Cu9BN'><form id='92pjd'><ins id='BV8pP'></ins><ul id='cJQcj'></ul><sub id='xS8pw'></sub></form><legend id='NhywC'></legend><bdo id='SxtQA'><pre id='GyQA4'><center id='IHaPY'></center></pre></bdo></b><th id='d8j28'></th></span></q></dt></tr></i><div id='Q8GyQ'><tfoot id='MhaUR'></tfoot><dl id='jPLAc'><fieldset id='eOttU'></fieldset></dl></div>