国产日黄色大片一区二区_男女一边脱一边亲一边膜_在线观看精品国产福利片第一页_精品一区二区三区首页在线_亚洲AV秘无码一区二区三区5_黄页网站视频观看免费_成人免费视频免费在线观看

The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of operations, e.g., relational operations for pre-processing or post-processing the dataset, and matrix operations for core model computations. Many existing systems focus on efficiently processing matrix-only operations, and assume that the inputs to the relational operators are already pre-computed and are materialized as intermediate matrices. However, the input to a relational operator may be complex in machine learning pipelines, and may involve various combinations of matrix operators. Hence, it is critical to realize scalable and efficient relational query processors that directly operate on big matrix data. This paper presents new efficient and scalable relational query processing techniques on big matrix data for in-memory distributed clusters. The proposed techniques leverage algebraic transformation rules to rewrite query execution plans into ones with lower computation costs. A distributed query plan optimizer exploits the sparsity-inducing property of merge functions as well as Bloom join strategies for efficiently evaluating various flavors of the join operation. Furthermore, optimized partitioning schemes for the input matrices are developed to facilitate the performance of join operations based on a cost model that minimizes the communication overhead.The proposed relational query processing techniques are prototyped in Apache Spark. Experiments on both real and synthetic data demonstrate that the proposed techniques achieve up to two orders of magnitude performance improvement over state-of-the-art systems on a wide range of applications.

相關內容

Processing（編(bian)程(cheng)語言）

關注 121

Processing 是一門開源編程語言和與之(zhi)配套(tao)的(de)集成開發環境（IDE）的(de)名稱。Processing 在電子(zi)藝(yi)術(shu)和視(shi)覺設計社區被用(yong)來教授編程基礎，并運(yun)用(yong)于(yu)大量(liang)的(de)新媒體(ti)和互動藝(yi)術(shu)作(zuo)品中。

Performer · 核化 · 向量化 · 稀疏 · Extensibility ·

2022 年 1 月 13 日

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Christina Giannoula,Ivan Fernandez,Juan Gómez-Luna,Nectarios Koziris,Georgios Goumas,Onur Mutlu

from arxiv, To appear in the Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) 2022 and the ACM SIGMETRICS 2022 conference

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low memory access latency, thereby being a good fit to accelerate the widely-used, memory-bound Sparse Matrix Vector Multiplication (SpMV) kernel. This paper provides the first comprehensive analysis of SpMV on a real-world PIM architecture, and presents SparseP, the first SpMV library for real PIM architectures. We make three key contributions. First, we implement a wide variety of software strategies on SpMV for a multithreaded PIM core and characterize the computational limits of a single multithreaded PIM core. Second, we design various load balancing schemes across multiple PIM cores, and two types of data partitioning techniques to execute SpMV on thousands of PIM cores: (1) 1D-partitioned kernels to perform the complete SpMV computation only using PIM cores, and (2) 2D-partitioned kernels to strive a balance between computation and data transfer costs to PIM-enabled memory. Third, we compare SpMV execution on a real-world PIM system with 2528 PIM cores to state-of-the-art CPU and GPU systems to study the performance and energy efficiency of various devices. SparseP software package provides 25 SpMV kernels for real PIM systems supporting the four most widely used compressed matrix formats, and a wide range of data types. Our extensive evaluation provides new insights and recommendations for software designers and hardware architects to efficiently accelerate SpMV on real PIM systems.

Vision · 數據集 · Integration · 可理解性 · HTTPS ·

2022 年 1 月 13 日

Fantastic Data and How to Query Them

Trung-Kien Tran,Anh Le-Tuan,Manh Nguyen-Duc,Jicheng Yuan,Danh Le-Phuoc

It is commonly acknowledged that the availability of the huge amount of (training) data is one of the most important factors for many recent advances in Artificial Intelligence (AI). However, datasets are often designed for specific tasks in narrow AI sub areas and there is no unified way to manage and access them. This not only creates unnecessary overheads when training or deploying Machine Learning models but also limits the understanding of the data, which is very important for data-centric AI. In this paper, we present our vision about a unified framework for different datasets so that they can be integrated and queried easily, e.g., using standard query languages. We demonstrate this in our ongoing work to create a framework for datasets in Computer Vision and show its advantages in different scenarios. Our demonstration is available at //vision.semkg.org.

prototype · Processing（編程語言） · 流 · 稀疏 · FPGA ·

2022 年 1 月 13 日

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Linghao Song,Yuze Chi,Atefeh Sohrabizadeh,Young-kyu Choi,Jason Lau,Jason Cong

from arxiv, To appear in FPGA'22

Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of applications, including scientific computing, graph processing, and deep learning. Architecting accelerators for SpMM is faced with three challenges - (1) the random memory accessing and unbalanced load in processing because of random distribution of elements in sparse matrices, (2) inefficient data handling of the large matrices which can not be fit on-chip, and (3) anon-general-purpose accelerator design where one accelerator can only process a fixed-size problem. In this paper, we present Sextans, an accelerator for general-purpose SpMM processing. Sextans accelerator features (1) fast random access using on-chip memory, (2) streaming access to off-chip large matrices, (3) PE-aware non-zero scheduling for balanced workload with an II=1 pipeline, and (4) hardware flexibility to enable prototyping the hardware once to support SpMMs of different size as a general-purpose accelerator. We leverage high bandwidth memory (HBM) for the efficient accessing of both sparse and dense matrices. In the evaluation, we present an FPGA prototype Sextans which is executable on a Xilinx U280 HBM FPGA board and a projected prototype Sextans-P with higher bandwidth comparable to V100 and more frequency optimization. We conduct a comprehensive evaluation on 1,400 SpMMs on a wide range of sparse matrices including 50 matrices from SNAP and 150 from SuiteSparse. WecompareSextanswith NVIDIA K80 and V100 GPUs.Sextansachieves a 2.50x geomean speedup over K80 GPU andSextans-Pachieves a 1.14x geomean speedup over V100 GPU (4.94x over K80). The code is available at //github.com/linghaosong/Sextans.

Processing（編程語言） · binary · 解碼 · Stream Processing · Performer ·

2022 年 1 月 12 日

Distributed Processing for Encoding and Decoding of Binary LDPC codes using MPI

Bhargav Gokalgandhi,Ivan Seskar

from arxiv, This project was funded by the NSF "COSMOS" Project under grant number CNS-1827923. Presented in INFOCOM 2019 CNERT Workshop

Low Density Parity Check (LDPC) codes are linear error correcting codes used in communication systems for Forward Error Correction (FEC). But, intensive computation is required for encoding and decoding of LDPC codes, making it difficult for practical usage in general purpose software based signal processing systems. In order to accelerate the encoding and decoding of LDPC codes, distributed processing over multiple multi-core CPUs using Message Passing Interface (MPI) is performed. Implementation is done using Stream Processing and Batch Processing mechanisms and the execution time for both implementations is compared w.r.t variation in number of CPUs and number of cores per CPU. Performance evaluation of distributed processing is shown by variation in execution time w.r.t. increase in number of processors (CPU cores).

entity · MINE · 可辨認的 · 異質信息網絡 · Networking ·

2022 年 1 月 11 日

ATRAPOS: Evaluating Metapath Query Workloads in Real Time

Serafeim Chatzopoulos,Thanasis Vergoulis,Dimitrios Skoutas,Theodore Dalamagas,Christos Tryfonopoulos,Panagiotis Karras

from arxiv, 13 pages, 19 figures

Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present ATRAPOS, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. ATRAPOS selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that ATRAPOS accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.

矩陣論 · 線性的 · 歐氏空間 · 反向傳播算法 · AIM ·

2022 年 1 月 1 日

Matrix Decomposition and Applications

Jun Lu

from arxiv, arXiv admin note: substantial text overlap with arXiv:2107.02579

In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.

矩陣乘積 · 近似 · CASE · 標量 · 分解的 ·

2021 年 6 月 21 日

Multiplying Matrices Without Multiplying

Davis Blalock,John Guttag

from arxiv, To appear at ICML 2021

Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100\times$ faster than exact matrix products and $10\times$ faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds. These results suggest that a mixture of hashing, averaging, and byte shuffling$-$the core operations of our method$-$could be a more promising building block for machine learning than the sparsified, factorized, and/or scalar quantized matrix products that have recently been the focus of substantial research and hardware investment.

Machine Learning · 分布式機器學習 · 學成 · Extensibility · Processing（編程語言） ·

2019 年 12 月 20 日

A Survey on Distributed Machine Learning

Joost Verbraeken,Matthijs Wolting,Jonathan Katzy,Jeroen Kloppenburg,Tim Verbelen,Jan S. Rellermeyer

The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.

圖 · 簇 · Processing（編程語言） · Signal Processing · Networks ·

2019 年 8 月 12 日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Ljubisa Stankovic,Danilo Mandic,Milos Dakovic,Milos Brajovic,Bruno Scalzo,Tony Constantinides

from arxiv, 49 pages, 40 figures

The area of Data Analytics on graphs promises a paradigm shift as we approach information processing of classes of data, which are typically acquired on irregular but structured domains (social networks, various ad-hoc sensor networks). Yet, despite its long history, current approaches mostly focus on the optimization of graphs themselves, rather than on directly inferring learning strategies, such as detection, estimation, statistical and probabilistic inference, clustering and separation from signals and data acquired on graphs. To fill this void, we first revisit graph topologies from a Data Analytics point of view, and establish a taxonomy of graph networks through a linear algebraic formalism of graph topology (vertices, connections, directivity). This serves as a basis for spectral analysis of graphs, whereby the eigenvalues and eigenvectors of graph Laplacian and adjacency matrices are shown to convey physical meaning related to both graph topology and higher-order graph properties, such as cuts, walks, paths, and neighborhoods. Next, to illustrate estimation strategies performed on graph signals, spectral analysis of graphs is introduced through eigenanalysis of mathematical descriptors of graphs and in a generic way. Finally, a framework for vertex clustering and graph segmentation is established based on graph spectral representation (eigenanalysis) which illustrates the power of graphs in various data association tasks. The supporting examples demonstrate the promise of Graph Data Analytics in modeling structural and functional/semantic inferences. At the same time, Part I serves as a basis for Part II and Part III which deal with theory, methods and applications of processing Data on Graphs and Graph Topology Learning from data.

Spark · 學成 · 深度學習框架 · 深度學習 · 大數據 ·

2018 年 4 月 16 日

BigDL: A Distributed Deep Learning Framework for Big Data

Jason, Dai,Yiheng Wang,Xin Qiu,Ding Ding,Yao Zhang,Yanzhang Wang,Xianyan Jia, Cherry, Zhang,Yan Wan,Zhichao Li,Jiao Wang,Shengsheng Huang,Zhongyuan Wu,Yang Wang,Yuhao Yang,Bowen She,Dongjie Shi,Qi Lu,Kai Huang,Guoqiong Song

In this paper, we present BigDL, a distributed deep learning framework for Big Data platforms and workflows. It is implemented on top of Apache Spark, and allows users to write their deep learning applications as standard Spark programs (running directly on large-scale big data clusters in a distributed fashion). It provides an expressive, "data-analytics integrated" deep learning programming model, so that users can easily build the end-to-end analytics + AI pipelines under a unified programming paradigm; by implementing an AllReduce like operation using existing primitives in Spark (e.g., shuffle, broadcast, and in-memory data persistence), it also provides a highly efficient "parameter server" style architecture, so as to achieve highly scalable, data-parallel distributed training. Since its initial open source release, BigDL users have built many analytics and deep learning applications (e.g., object detection, sequence-to-sequence generation, neural recommendations, fraud detection, etc.) on Spark.