99欧美日韩精品一区二区红桃-又大又硬又长又粗免费看

Today's international corporations such as BASF, a leading company in the crop protection industry, produce and consume more and more data that are often fragmented and accessible through Web APIs. In addition, part of the proprietary and public data of BASF's interest are stored in triple stores and accessible with the SPARQL query language. Homogenizing the data access modes and the underlying semantics of the data without modifying or replicating the original data sources become important requirements to achieve data integration and interoperability. In this work, we propose a federated data integration architecture within an industrial setup, that relies on an ontology-based data access method. Our performance evaluation in terms of query response time showed that most queries can be answered in under 1 second.

相關內容

WEB

關注 0

Automator · 流 · Stream Processing · Analysis · FM ·

2022 年 7 月 18 日

Lightweight Automated Feature Monitoring for Data Streams

Jo?o Conde,Ricardo Moreira,Jo?o Torres,Pedro Cardoso,Hugo Ferreira,Marco O. P. Sampaio,Jo?o Tiago Ascens?o,Pedro Bizarro

from arxiv, 10 pages, 5 figures. AutoML, KDD22, August 14-17, 2022, Washington, DC, US

Monitoring the behavior of automated real-time stream processing systems has become one of the most relevant problems in real world applications. Such systems have grown in complexity relying heavily on high dimensional input data, and data hungry Machine Learning (ML) algorithms. We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets, with a small and constant memory footprint and a small computational cost in streaming applications. The method is based on a multi-variate statistical test and is data driven by design (full reference distributions are estimated from the data). It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs (to aid in root cause analysis). The computational and memory lightness of the system results from the use of Exponential Moving Histograms. In our experimental study, we analyze the system's behavior with its parameters and, more importantly, show examples where it detects problems that are not directly related to a single feature. This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.

線性的 · 樣本復雜度 · 價值函數 · Learning · 學習器 ·

2022 年 7 月 18 日

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

Philip Amortila,Nan Jiang,Dhruv Madeka,Dean P. Foster

The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires $\tilde{\mathcal{O}}(d)$ expert queries and a $\texttt{poly}(d,H,|\mathcal{A}|,1/\varepsilon)$ amount of exploratory samples to provably recover an $\varepsilon$-suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input. Compared to prior imitation learning (IL) approaches, our required number of expert demonstrations is independent of $H$ and logarithmic in $1/\varepsilon$, whereas all prior work required at least linear factors of both in addition to the same dependence on $d$. Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of $d,H,$ and $|\mathcal{A}|$) will require at least $\tilde\Omega(\sqrt{d})$ oracle calls to recover a policy competing with the expert's value function. Under the weaker assumption that the expert's policy is linear, we show that the lower bound increases to $\tilde\Omega(d)$.

Learning · MoDELS · Performer · 相互獨立的 · 可交換的 ·

2022 年 7 月 18 日

Cross-Silo Heterogeneous Model Federated Multitask Learning

Xingjian Cao,Zonghang Li,Gang Sun,Hongfang Yu,Mohsen Guizani

Federated learning (FL) is a machine learning technique that enables participants to collaboratively train high-quality models without exchanging their private data. Participants utilizing cross-silo FL (CS-FL) settings are independent organizations with different task needs, and they are concerned not only with data privacy but also with independently training their unique models due to intellectual property considerations. Most existing FL methods are incapable of satisfying the above scenarios. In this paper, we propose a FL method based on the pseudolabeling of unlabeled data via a process such as cotraining. To the best of our knowledge, this is the first FL method that is simultaneously compatible with heterogeneous tasks, heterogeneous models, and heterogeneous training algorithms. Experimental results show that the proposed method achieves better performance than competing ones. This is especially true for non-independent and identically distributed (IID) settings and heterogeneous models, where the proposed method achieves a 35% performance improvement.

Learning · MoDELS · 知識 (knowledge) · 聯邦學習 · Extensibility ·

2022 年 7 月 17 日

No One Left Behind: Inclusive Federated Learning over Heterogeneous Devices

Ruixuan Liu,Fangzhao Wu,Chuhan Wu,Yanlin Wang,Lingjuan Lyu,Hong Chen,Xing Xie

from arxiv, Accepted by KDD22

Federated learning (FL) is an important paradigm for training global models from decentralized data in a privacy-preserving way. Existing FL methods usually assume the global model can be trained on any participating client. However, in real applications, the devices of clients are usually heterogeneous, and have different computing power. Although big models like BERT have achieved huge success in AI, it is difficult to apply them to heterogeneous FL with weak clients. The straightforward solutions like removing the weak clients or using a small model to fit all clients will lead to some problems, such as under-representation of dropped clients and inferior accuracy due to data loss or limited model representation ability. In this work, we propose InclusiveFL, a client-inclusive federated learning method to handle this problem. The core idea of InclusiveFL is to assign models of different sizes to clients with different computing capabilities, bigger models for powerful clients and smaller ones for weak clients. We also propose an effective method to share the knowledge among multiple local models with different sizes. In this way, all the clients can participate in the model learning in FL, and the final model can be big and powerful enough. Besides, we propose a momentum knowledge distillation method to better transfer knowledge in big models on powerful clients to the small models on weak clients. Extensive experiments on many real-world benchmark datasets demonstrate the effectiveness of the proposed method in learning accurate models from clients with heterogeneous devices under the FL framework.

Learning · 聯邦學習 · 未標記 · MASS · 監督 ·

2022 年 7 月 17 日

Federated Self-Supervised Learning in Heterogeneous Settings: Limits of a Baseline Approach on HAR

Sannara Ek,Romain Rombourg,Fran?ois Portet,Philippe Lalanda

from arxiv, S. Ek, R. Rombourg, F. Portet and P. Lalanda, "Federated Self-Supervised Learning in Heterogeneous Settings: Limits of a Baseline Approach on HAR," 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 2022, pp. 557-562

Federated Learning is a new machine learning paradigm dealing with distributed model learning on independent devices. One of the many advantages of federated learning is that training data stay on devices (such as smartphones), and only learned models are shared with a centralized server. In the case of supervised learning, labeling is entrusted to the clients. However, acquiring such labels can be prohibitively expensive and error-prone for many tasks, such as human activity recognition. Hence, a wealth of data remains unlabelled and unexploited. Most existing federated learning approaches that focus mainly on supervised learning have mostly ignored this mass of unlabelled data. Furthermore, it is unclear whether standard federated Learning approaches are suited to self-supervised learning. The few studies that have dealt with the problem have limited themselves to the favorable situation of homogeneous datasets. This work lays the groundwork for a reference evaluation of federated Learning with Semi-Supervised Learning in a realistic setting. We show that standard lightweight autoencoder and standard Federated Averaging fail to learn a robust representation for Human Activity Recognition with several realistic heterogeneous datasets. These findings advocate for a more intensive research effort in Federated Self Supervised Learning to exploit the mass of heterogeneous unlabelled data present on mobile devices.

Awesome (軟件) · 優化器 · MoDELS · Better · Engineering ·

2022 年 7 月 16 日

AWESOME: Empowering Scalable Data Science on Social Media Data with an Optimized Tri-Store Data System

Xiuwen Zheng,Subhasis Dasgupta,Arun Kumar,Amarnath Gupta

Modern data science applications increasingly use heterogeneous data sources and analytics. This has led to growing interest in polystore systems, especially analytical polystores. In this work, we focus on emerging multi-data model analytics workloads over social media data that fluidly straddle relational, graph, and text analytics. Instead of a generic polystore, we build a "tri-store" system that is more aware of the underlying data models to better optimize execution to improve scalability and runtime efficiency. We name our system AWESOME (Analytics WorkbEnch for SOcial MEdia). It features a powerful domain-specific language named ADIL. ADIL builds on top of underlying query engines (e.g., SQL and Cypher) and features native data types for succinctly specifying cross-engine queries and NLP operations, as well as automatic in-memory and query optimizations. Using real-world tri-model analytical workloads and datasets, we empirically demonstrate the functionalities of AWESOME for scalable data science over social media data and evaluate its efficiency.

可辨認的 · prototype · Automator · Use Case · 控制器 ·

2022 年 7 月 15 日

Identifying and Quantifying Trade-offs in Multi-Stakeholder Risk Evaluation with Applications to the Data Protection Impact Assessment of the GDPR

Majid Mollaeefar,Silvio Ranise

Cybersecurity risk management consists of several steps including the selection of appropriate controls to minimize risks. This is a difficult task that requires to search through all possible subsets of a set of available controls and identify those that minimize the risks of all stakeholders. Since stakeholders may have different perceptions of the risks (especially when considering the impact of threats), conflicting goals may arise that require to find the best possible trade-offs among the various needs. In this work, we propose a quantitative and (semi)automated approach to solve this problem based on the well-known notion of Pareto optimality. For validation, we show how a prototype tool based on our approach can assist in the Data Protection Impact Assessment mandated by the General Data Protection Regulation on a simplified but realistic use case scenario. We also evaluate the scalability of the approach by conducting an experimental evaluation with the prototype with encouraging results.

INFORMS · 估計/估計量 · 點估計 · 推斷 · 統計量 ·

2021 年 8 月 10 日

Federated Causal Inference in Heterogeneous Observational Data

Ruoxuan Xiong,Allison Koenecke,Michael Powell,Zhu Shen,Joshua T. Vogelstein,Susan Athey

Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.

蒸餾 · MoDELS · 聯邦學習 · 學成 · 歸納偏好 ·

2021 年 6 月 9 日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Zhuangdi Zhu,Junyuan Hong,Jiayu Zhou

Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.

Performer · 聯邦學習 · Extensibility · 學成 · 分解的 ·

2021 年 2 月 21 日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Chengxu Yang,QiPeng Wang,Mengwei Xu,Zhenpeng Chen,Kaigui Bian,Yunxin Liu,Xuanzhe Liu

Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.