云南虫谷在线观看免费观看电视剧-亚洲国产原创精品国语一区

With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular, especially in industries. It is becoming increasingly evident that effective big data analysis is key to solving artificial intelligence problems. Thus, a multi-algorithm library was implemented in the Spark framework, called MLlib. While this library supports multiple machine learning algorithms, there is still scope to use the Spark setup efficiently for highly time-intensive and computationally expensive procedures like deep learning. In this paper, we propose a novel framework that combines the distributive computational abilities of Apache Spark and the advanced machine learning architecture of a deep multi-layer perceptron (MLP), using the popular concept of Cascade Learning. We conduct empirical analysis of our framework on two real world datasets. The results are encouraging and corroborate our proposed framework, in turn proving that it is an improvement over traditional big data analysis methods that use either Spark or Deep learning as individual elements.

相關內容

Spark

關注 51

Apache Spark 是(shi)專為大規(gui)模數據處理(li)而設計(ji)的快速(su)通用的計(ji)算引(yin)擎。Spark是(shi)UC Berkeley AMP lab (加(jia)州(zhou)大學伯克利(li)分校的AMP實(shi)驗室)所(suo)開源的類Hadoop MapReduce的通用并行(xing)框架，Spark，擁有Hadoop MapReduce所(suo)具有的優點；但不同于(yu)MapReduce的是(shi)Job中間輸出結果可(ke)以保存(cun)在內(nei)存(cun)中，從而不再需要讀寫HDFS，因此Spark能更好地適(shi)用于(yu)數據挖(wa)掘與機(ji)器學習等需要迭(die)代的MapReduce的算法。

學成 · INTERACT · Storage · TOOLS · Processing（編程語言） ·

2020 年 3 月 25 日

A Survey on Trajectory Data Management, Analytics, and Learning

Sheng Wang,Zhifeng Bao,J. Shane Culpepper,Gao Cong

from arxiv, 36 page, 12 figures

Recent advances in sensor and mobile devices have enabled an unprecedented increase in the availability and collection of urban trajectory data, thus increasing the demand for more efficient ways to manage and analyze the data being produced. In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering. We also explore four closely related analytical tasks commonly used with trajectory data in interactive or real-time processing. Deep trajectory learning is also reviewed for the first time. Finally, we outline the essential qualities that a trajectory management system should possess in order to maximize flexibility.

聯邦學習 · 學成 · Extensibility · Machine Learning · Principle ·

2019 年 12 月 10 日

Advances and Open Problems in Federated Learning

Peter Kairouz,H. Brendan McMahan,Brendan Avent,Aurélien Bellet,Mehdi Bennis,Arjun Nitin Bhagoji,Keith Bonawitz,Zachary Charles,Graham Cormode,Rachel Cummings,Rafael G. L. D'Oliveira,Salim El Rouayheb,David Evans,Josh Gardner,Zachary Garrett,Adrià Gascón,Badih Ghazi,Phillip B. Gibbons,Marco Gruteser,Zaid Harchaoui,Chaoyang He,Lie He,Zhouyuan Huo,Ben Hutchinson,Justin Hsu,Martin Jaggi,Tara Javidi,Gauri Joshi,Mikhail Khodak,Jakub Kone?ny,Aleksandra Korolova,Farinaz Koushanfar,Sanmi Koyejo,Tancrède Lepoint,Yang Liu,Prateek Mittal,Mehryar Mohri,Richard Nock,Ayfer ?zgür,Rasmus Pagh,Mariana Raykova,Hang Qi,Daniel Ramage,Ramesh Raskar,Dawn Song,Weikang Song,Sebastian U. Stich,Ziteng Sun,Ananda Theertha Suresh,Florian Tramèr,Praneeth Vepakomma,Jianyu Wang,Li Xiong,Zheng Xu,Qiang Yang,Felix X. Yu,Han Yu,Sen Zhao

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

ACM Multimedia · 可理解性 · 多峰值 · state-of-the-art · 有向 ·

2019 年 10 月 3 日

Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

Sicheng Zhao,Shangfei Wang,Mohammad Soleymani,Dhiraj Joshi,Qiang Ji

from arxiv, Accepted by ACM TOMM

The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., image, music, and video), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC technologies comprehensively for large-scale heterogeneous multimedia data. We begin this survey by introducing the typical emotion representation models from psychology that are widely employed in AC. We briefly describe the available datasets for evaluating AC algorithms. We then summarize and compare the representative methods on AC of different multimedia types, i.e., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods and deep learning methods. Finally, we discuss some challenges and future directions for multimedia affective computing.

分布式機器學習 · Machine Learning · 學成 · Storage · 優化器 ·

2019 年 9 月 18 日

Distributed Machine Learning on Mobile Devices: A Survey

Renjie Gu,Shuo Yang,Fan Wu

In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.

小樣本學習 · 少試學習 · Extensibility · 可約的 · 學成 ·

2019 年 4 月 10 日

Few-shot Learning: A Survey

Yaqing Wang,Quanming Yao

The quest of `can machines think' and `can machines do what human do' are quests that drive the development of artificial intelligence. Although recent artificial intelligence succeeds in many data intensive applications, it still lacks the ability of learning from limited exemplars and fast generalizing to new tasks. To tackle this problem, one has to turn to machine learning, which supports the scientific study of artificial intelligence. Particularly, a machine learning problem called Few-Shot Learning (FSL) targets at this case. It can rapidly generalize to new tasks of limited supervised experience by turning to prior knowledge, which mimics human's ability to acquire knowledge from few examples through generalization and analogy. It has been seen as a test-bed for real artificial intelligence, a way to reduce laborious data gathering and computationally costly training, and antidote for rare cases learning. With extensive works on FSL emerging, we give a comprehensive survey for it. We first give the formal definition for FSL. Then we point out the core issues of FSL, which turns the problem from "how to solve FSL" to "how to deal with the core issues". Accordingly, existing works from the birth of FSL to the most recent published ones are categorized in a unified taxonomy, with thorough discussion of the pros and cons for different categories. Finally, we envision possible future directions for FSL in terms of problem setup, techniques, applications and theory, hoping to provide insights to both beginners and experienced researchers.

圖像降噪 · 去噪 · 學成 · 深度學習 · Processing（編程語言） ·

2018 年 10 月 11 日

Deep Learning for Image Denoising: A Survey

Chunwei Tian,Yong Xu,Lunke Fei,Ke Yan

Since the proposal of big data analysis and Graphic Processing Unit (GPU), the deep learning technology has received a great deal of attention and has been widely applied in the field of imaging processing. In this paper, we have an aim to completely review and summarize the deep learning technologies for image denoising proposed in recent years. Morever, we systematically analyze the conventional machine learning methods for image denoising. Finally, we point out some research directions for the deep learning technologies in image denoising.

Spark · 學成 · 深度學習框架 · 深度學習 · 大數據 ·

2018 年 4 月 16 日

BigDL: A Distributed Deep Learning Framework for Big Data

Jason, Dai,Yiheng Wang,Xin Qiu,Ding Ding,Yao Zhang,Yanzhang Wang,Xianyan Jia, Cherry, Zhang,Yan Wan,Zhichao Li,Jiao Wang,Shengsheng Huang,Zhongyuan Wu,Yang Wang,Yuhao Yang,Bowen She,Dongjie Shi,Qi Lu,Kai Huang,Guoqiong Song

In this paper, we present BigDL, a distributed deep learning framework for Big Data platforms and workflows. It is implemented on top of Apache Spark, and allows users to write their deep learning applications as standard Spark programs (running directly on large-scale big data clusters in a distributed fashion). It provides an expressive, "data-analytics integrated" deep learning programming model, so that users can easily build the end-to-end analytics + AI pipelines under a unified programming paradigm; by implementing an AllReduce like operation using existing primitives in Spark (e.g., shuffle, broadcast, and in-memory data persistence), it also provides a highly efficient "parameter server" style architecture, so as to achieve highly scalable, data-parallel distributed training. Since its initial open source release, BigDL users have built many analytics and deep learning applications (e.g., object detection, sequence-to-sequence generation, neural recommendations, fraud detection, etc.) on Spark.

Wide & Deep · CTR · 學成 · MoDELS · 端到端 ·

2018 年 4 月 12 日

DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

Huifeng Guo,Ruiming Tang,Yunming Ye,Zhenguo Li,Xiuqiang He,Zhenhua Dong

from arxiv, 14 pages. arXiv admin note: text overlap with arXiv:1703.04247

Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods have a strong bias towards low- or high-order interactions, or rely on expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed framework, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared raw feature input to both its "wide" and "deep" components, with no need of feature engineering besides raw features. DeepFM, as a general learning framework, can incorporate various network architectures in its deep component. In this paper, we study two instances of DeepFM where its "deep" component is DNN and PNN respectively, for which we denote as DeepFM-D and DeepFM-P. Comprehensive experiments are conducted to demonstrate the effectiveness of DeepFM-D and DeepFM-P over the existing models for CTR prediction, on both benchmark data and commercial data. We conduct online A/B test in Huawei App Market, which reveals that DeepFM-D leads to more than 10% improvement of click-through rate in the production environment, compared to a well-engineered LR model. We also covered related practice in deploying our framework in Huawei App Market.

情感分析 · 學成 · 深度學習 · state-of-the-art · Machine Learning ·

2018 年 1 月 24 日

Deep Learning for Sentiment Analysis : A Survey

Lei Zhang,Shuai Wang,Bing Liu

Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

情感分析 · INTERACT · 情感分類 · 大數據 · Storage ·

2018 年 1 月 11 日

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

Rodrigo Martínez-Casta?o,Juan C. Pichel,Pablo Gamallo

In this paper we propose a new parallel architecture based on Big Data technologies for real-time sentiment analysis on microblogging posts. Polypus is a modular framework that provides the following functionalities: (1) massive text extraction from Twitter, (2) distributed non-relational storage optimized for time range queries, (3) memory-based intermodule buffering, (4) real-time sentiment classification, (5) near real-time keyword sentiment aggregation in time series, (6) a HTTP API to interact with the Polypus cluster and (7) a web interface to analyze results visually. The whole architecture is self-deployable and based on Docker containers.