良辰好景知几何电视剧免费观看,伊人久久大香线蕉精品69

Data Integration of heterogeneous data sources relies either on periodically transferring large amounts of data to a physical Data Warehouse or retrieving data from the sources on request only. The latter results in the creation of what is referred to as a virtual Data Warehouse, which is preferable when the use of the latest data is paramount. However, the downside is that it adds network traffic and suffers from performance degradation when the amount of data is high. In this paper, we propose the use of a readCheck validator to ensure the timeliness of the queried data and reduced data traffic. It is further shown that the readCheck allows transactions to update data in the data sources obeying full Atomicity, Consistency, Isolation, and Durability (ACID) properties.

相關內容

可約的

關注 2

Automator · 得分 · 統計量 · 標注 · 數據集 ·

2021 年 11 月 29 日

Statistical Learning to Operationalize a Domain Agnostic Data Quality Scoring

Sezal Chug,Priya Kaushal,Ponnurangam Kumaraguru,Tavpritesh Sethi

from arxiv, 20 Pages, 8 Figures, 1 Table

Data is expanding at an unimaginable rate, and with this development comes the responsibility of the quality of data. Data Quality refers to the relevance of the information present and helps in various operations like decision making and planning in a particular organization. Mostly data quality is measured on an ad-hoc basis, and hence none of the developed concepts provide any practical application. The current empirical study was undertaken to formulate a concrete automated data quality platform to assess the quality of incoming dataset and generate a quality label, score and comprehensive report. We utilize various datasets from healthdata.gov, opendata.nhs and Demographics and Health Surveys (DHS) Program to observe the variations in the quality score and formulate a label using Principal Component Analysis(PCA). The results of the current empirical study revealed a metric that encompasses nine quality ingredients, namely provenance, dataset characteristics, uniformity, metadata coupling, percentage of missing cells and duplicate rows, skewness of data, the ratio of inconsistencies of categorical columns, and correlation between these attributes. The study also provides an illustrative case study and validation of the metric following Mutation Testing approaches. This research study provides an automated platform which takes an incoming dataset and metadata to provide the DQ score, report and label. The results of this study would be useful to data scientists as the value of this quality label would instill confidence before deploying the data for his/her respective practical application.

MoDELS · 估計/估計量 · 學成 · 統計量 · 分解的 ·

2021 年 11 月 28 日

Learning Wildfire Model from Incomplete State Observations

Alissa Chavalithumrong,Hyung-Jin Yoon,Petros Voulgaris

As wildfires are expected to become more frequent and severe, improved prediction models are vital to mitigating risk and allocating resources. With remote sensing data, valuable spatiotemporal statistical models can be created and used for resource management practices. In this paper, we create a dynamic model for future wildfire predictions of five locations within the western United States through a deep neural network via historical burned area and climate data. The proposed model has distinct features that address the characteristic need in prediction evaluations, including dynamic online estimation and time-series modeling. Between locations, local fire event triggers are not isolated, and there are confounding factors when local data is analyzed due to incomplete state observations. When compared to existing approaches that do not account for incomplete state observation within wildfire time-series data, on average, we are able to achieve higher prediction performances.

Continuity · 學成 · 統計量 · entity · 聯邦學習 ·

2021 年 11 月 26 日

Non-IID data and Continual Learning processes in Federated Learning: A long road ahead

Marcos F. Criado,Fernando E. Casado,Roberto Iglesias,Carlos V. Regueiro,Senén Barro

from arxiv, 18 pages, 5 figures

Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. This decentralized approach is prone to suffer the consequences of data statistical heterogeneity, both across the different entities and over time, which may lead to a lack of convergence. To avoid such issues, different methods have been proposed in the past few years. However, data may be heterogeneous in lots of different ways, and current proposals do not always determine the kind of heterogeneity they are considering. In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it. At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.

近似貝葉斯計算 · MoDELS · 估計/估計量 · Processing（編程語言） · tuning ·

2021 年 11 月 26 日

Approximate Bayesian Computation for Physical Inverse Modeling

Neel Chatterjee,Somya Sharma,Sarah Swisher,Snigdhansu Chatterjee

Semiconductor device models are essential to understand the charge transport in thin film transistors (TFTs). Using these TFT models to draw inference involves estimating parameters used to fit to the experimental data. These experimental data can involve extracted charge carrier mobility or measured current. Estimating these parameters help us draw inferences about device performance. Fitting a TFT model for a given experimental data using the model parameters relies on manual fine tuning of multiple parameters by human experts. Several of these parameters may have confounding effects on the experimental data, making their individual effect extraction a non-intuitive process during manual tuning. To avoid this convoluted process, we propose a new method for automating the model parameter extraction process resulting in an accurate model fitting. In this work, model choice based approximate Bayesian computation (aBc) is used for generating the posterior distribution of the estimated parameters using observed mobility at various gate voltage values. Furthermore, it is shown that the extracted parameters can be accurately predicted from the mobility curves using gradient boosted trees. This work also provides a comparative analysis of the proposed framework with fine-tuned neural networks wherein the proposed framework is shown to perform better.

COVID-19 · 衰減系數 · 估計/估計量 · Less · 分解的 ·

2021 年 11 月 25 日

Efficiency of the financial markets during the COVID-19 crisis: time-varying parameters of fractional stable dynamics

Ayoub Ammy-Driss,Matthieu Garcin

This paper investigates the impact of COVID-19 on financial markets. It focuses on the evolution of the market efficiency, using two efficiency indicators: the Hurst exponent and the memory parameter of a fractional L\'evy-stable motion. The second approach combines, in the same model of dynamic, an alpha-stable distribution and a dependence structure between price returns. We provide a dynamic estimation method for the two efficiency indicators. This method introduces a free parameter, the discount factor, which we select so as to get the best alpha-stable density forecasts for observed price returns. The application to stock indices during the COVID-19 crisis shows a strong loss of efficiency for US indices. On the opposite, Asian and Australian indices seem less affected and the inefficiency of these markets during the COVID-19 crisis is even questionable.

圖形處理器 · MoDELS · Networking · Neural Networks · 圖 ·

2021 年 6 月 9 日

Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling

Chuizheng Meng,Sirisha Rambhatla,Yan Liu

from arxiv, To be published in the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 21)

Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.

視覺問答 · conceptNet · state-of-the-art · contrastive · 情景 ·

2021 年 3 月 23 日

Multi-Modal Answer Validation for Knowledge-Based VQA

Jialin Wu,Jiasen Lu,Ashish Sabharwal,Roozbeh Mottaghi

The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in a variety of forms, including visual, textual, and commonsense knowledge. The use of more knowledge sources, however, also increases the chance of retrieving more irrelevant or noisy facts, making it difficult to comprehend the facts and find the answer. To address this challenge, we propose Multi-modal Answer Validation using External knowledge (MAVEx), where the idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval. This is in contrast to existing approaches that search for the answer in a vast collection of often irrelevant facts. Our approach aims to learn which knowledge source should be trusted for each answer candidate and how to validate the candidate using that source. We consider a multi-modal setting, relying on both textual and visual knowledge resources, including images searched using Google, sentences from Wikipedia articles, and concepts from ConceptNet. Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results.

MINE · 數據挖掘 · 圖 · 知識圖譜 · entity ·

2018 年 8 月 7 日

AceKG: A Large-scale Knowledge Graph for Academic Data Mining

Ruijie Wang,Yuchen Yan,Jialu Wang,Yuting Jia,Ye Zhang,Weinan Zhang,Xinbing Wang

from arxiv, CIKM 2018

Most existing knowledge graphs (KGs) in academic domains suffer from problems of insufficient multi-relational information, name ambiguity and improper data format for large-scale machine processing. In this paper, we present AceKG, a new large-scale KG in academic domain. AceKG not only provides clean academic information, but also offers a large-scale benchmark dataset for researchers to conduct challenging data mining projects including link prediction, community detection and scholar classification. Specifically, AceKG describes 3.13 billion triples of academic facts based on a consistent ontology, including necessary properties of papers, authors, fields of study, venues and institutes, as well as the relations among them. To enrich the proposed knowledge graph, we also perform entity alignment with existing databases and rule-based inference. Based on AceKG, we conduct experiments of three typical academic data mining tasks and evaluate several state-of- the-art knowledge embedding and network representation learning approaches on the benchmark datasets built from AceKG. Finally, we discuss several promising research directions that benefit from AceKG.

大數據分析 · 數據分析 · Machine Learning · 大數據 · 可辨認的 ·

2018 年 8 月 2 日

Mobile big data analysis with machine learning

Jiyang Xie,Zeyu Song,Yupeng Li,Zhanyu Ma

from arxiv, Version 0.1

This paper investigates to identify the requirement and the development of machine learning-based mobile big data analysis through discussing the insights of challenges in the mobile big data (MBD). Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently adopted methods of data analysis are reviewed. Three typical applications of MBD analysis, namely wireless channel modeling, human online and offline behavior analysis, and speech recognition in the internet of vehicles, are introduced respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis.

大數據 · 可理解性 · 復合數據 · Processing（編程語言） · Better ·

2016 年 1 月 15 日

Big Data: Understanding Big Data

Kevin Taylor-Sakyi

from arxiv, 8 pages, Big Data Analytics, Data Storage, MapReduce, Knowledge-Space, Big Data Inconsistencies

Steve Jobs, one of the greatest visionaries of our time was quoted in 1996 saying "a lot of times, people do not know what they want until you show it to them" [38] indicating he advocated products to be developed based on human intuition rather than research. With the advancements of mobile devices, social networks and the Internet of Things, enormous amounts of complex data, both structured and unstructured are being captured in hope to allow organizations to make better business decisions as data is now vital for an organizations success. These enormous amounts of data are referred to as Big Data, which enables a competitive advantage over rivals when processed and analyzed appropriately. However Big Data Analytics has a few concerns including Management of Data-lifecycle, Privacy & Security, and Data Representation. This paper reviews the fundamental concept of Big Data, the Data Storage domain, the MapReduce programming paradigm used in processing these large datasets, and focuses on two case studies showing the effectiveness of Big Data Analytics and presents how it could be of greater good in the future if handled appropriately.