国产综合欧美日韩激情在线,日韩亚洲国产中文字幕精品在线,亚洲欧美日韩中文字幕在线不卡

Many studies have demonstrated that mobile applications are common means to collect massive amounts of personal data. This goes unnoticed by most users, who are also unaware that many different organizations are receiving this data, even from multiple apps in parallel. This paper assesses different techniques to identify the organizations that are receiving personal data flows in the Android ecosystem, namely the WHOIS service, SSL certificates inspection, and privacy policy textual analysis. Based on our findings, we propose a fully automated method that combines the most successful techniques, achieving a 94.73% precision score in identifying the recipient organization. We further demonstrate our method by evaluating 1,000 Android apps and exposing the corporations that collect the users' personal data.

相關內容

可辨認的

關注 4

Learning · Analysis · 優化器 · 可約的 · 易處理的 ·

2022 年 6 月 10 日

Refined Convergence and Topology Learning for Decentralized Optimization with Heterogeneous Data

B. Le Bars,A. Bellet,M. Tommasi,E. Lavoie,AM. Kermarrec

One of the key challenges in decentralized and federated learning is to design algorithms that efficiently deal with highly heterogeneous data distributions across agents. In this paper, we revisit the analysis of Decentralized Stochastic Gradient Descent algorithm (D-SGD) under data heterogeneity. We exhibit the key role played by a new quantity, called \emph{neighborhood heterogeneity}, on the convergence rate of D-SGD. By coupling the communication topology and the heterogeneity, our analysis sheds light on the poorly understood interplay between these two concepts in decentralized learning. We then argue that neighborhood heterogeneity provides a natural criterion to learn data-dependent topologies that reduce (and can even eliminate) the otherwise detrimental effect of data heterogeneity on the convergence time of D-SGD. For the important case of classification with label skew, we formulate the problem of learning such a good topology as a tractable optimization problem that we solve with a Frank-Wolfe algorithm. As illustrated over a set of simulated and real-world experiments, our approach provides a principled way to design a sparse topology that balances the convergence speed and the per-iteration communication costs of D-SGD under data heterogeneity.

環 · 回合 · GROUP · Middleware ·

2022 年 6 月 9 日

The Loop of the Rings: A Distributed Cooperative System

Arash Vaezi

We introduce a decentralized and distributed collaborative environment denoted by $LoR$, which stands for "The Loop of the Rings". The $LoR$ system provides a secure, user-friendly cooperative environment for users who can offer particular services to each other. The system manages to provide reliability and security using randomized technics in a well-structured environment. These technics together provide consensus and trust for the groups of collaborator parties. This platform carries a blockchain-based distributed database to save important information. The $LoR$ system deals with cooperation rather than transactions. The system should manage and verify the collaboration between each group of participants who work with each other from the start to the end of the collaboration. The unique structure of the $LoR$ system makes it a secure and reliable middleware between a (distributed) database and a service provider system. Such a service provider could be a freelancer or an IoT management system. The 5G-related services can be organized to be managed by the $LoR$ platform.

噪聲分布 · 正則的 · 噪聲 · 泛函 · 分解 ·

2022 年 6 月 9 日

Log-Concave and Multivariate Canonical Noise Distributions for Differential Privacy

Jordan Awan,Jinshuo Dong

from arxiv, 9 pages before references, 1 Figure

A canonical noise distribution (CND) is an additive mechanism designed to satisfy $f$-differential privacy ($f$-DP), without any wasted privacy budget. $f$-DP is a hypothesis testing-based formulation of privacy phrased in terms of tradeoff functions, which captures the difficulty of a hypothesis test. In this paper, we consider the existence and construction of log-concave CNDs as well as multivariate CNDs. Log-concave distributions are important to ensure that higher outputs of the mechanism correspond to higher input values, whereas multivariate noise distributions are important to ensure that a joint release of multiple outputs has a tight privacy characterization. We show that the existence and construction of CNDs for both types of problems is related to whether the tradeoff function can be decomposed by functional composition (related to group privacy) or mechanism composition. In particular, we show that pure $\epsilon$-DP cannot be decomposed in either way and that there is neither a log-concave CND nor any multivariate CND for $\epsilon$-DP. On the other hand, we show that Gaussian-DP, $(0,\delta)$-DP, and Laplace-DP each have both log-concave and multivariate CNDs.

Weight · Analysis · 教程 · 狀態估計 · 平滑 ·

2022 年 6 月 9 日

The leaky integrator that could: Or recursive polynomial regression for online signal analysis

Hugh L Kennedy

Fitting a local polynomial model to a noisy sequence of uniformly sampled observations or measurements (i.e. regressing) by minimizing the sum of weighted squared errors (i.e. residuals) may be used to design digital filters for a diverse range of signal-analysis problems, such as detection, classification and tracking (i.e. smoothing or state estimation), in biomedical, financial, and aerospace applications, for instance. Furthermore, the recursive realization of such filters, using a network of so-called leaky integrators, yields simple digital components with a low computational complexity that are ideal in embedded online sensing systems with high data rates. Target tracking, pulse-edge detection, peak detection and anomaly/change detection are considered in this tutorial as illustrative examples. Erlang-weighted polynomial regression provides a design framework within which the various design trade-offs of state estimators (e.g. bias errors vs. random errors) and IIR smoothers (e.g. frequency isolation vs. time localization) may be intuitively balanced. Erlang weights are configured using a smoothing parameter which determines the decay rate of the exponential tail; and a shape parameter which may be used to discount more recent data, so that a greater relative emphasize is placed on a past time interval. In Morrison's 1969 treatise on sequential smoothing and prediction, the exponential weight and the Laguerre polynomials that are orthogonal with respect to this weight, are described in detail; however, more general Erlang weights and the resulting associated Laguerre polynomials are not considered there, nor have they been covered in detail elsewhere since. Thus, one of the purposes of this tutorial is to explain how Erlang weights may be used to shape and improve the (impulse and frequency) response of recursive regression filters.

Learning · Performer · 估計/估計量 · 子采樣 · 學習器 ·

2022 年 6 月 8 日

Estimation of Predictive Performance in High-Dimensional Data Settings using Learning Curves

Jeroen M. Goedhart,Thomas Klausch,Mark A. van de Wiel

from arxiv, 19 pages, 2 figures, 2 tables

In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.

邊 · 推斷 · 集成 · 邊緣設備 · 模型評估 ·

2022 年 6 月 7 日

Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge

May Malka,Erez Farhan,Hai Morgenstern,Nir Shlezinger

The success of deep neural networks (DNNs) is heavily dependent on computational resources. While DNNs are often employed on cloud servers, there is a growing need to operate DNNs on edge devices. Edge devices are typically limited in their computational resources, yet, often multiple edge devices are deployed in the same environment and can reliably communicate with each other. In this work we propose to facilitate the application of DNNs on the edge by allowing multiple users to collaborate during inference to improve their accuracy. Our mechanism, coined {\em edge ensembles}, is based on having diverse predictors at each device, which form an ensemble of models during inference. To mitigate the communication overhead, the users share quantized features, and we propose a method for aggregating multiple decisions into a single inference rule. We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network. Our experiments demonstrate that collaborative inference via edge ensembles equipped with compact DNNs substantially improves the accuracy over having each user infer locally, and can outperform using a single centralized DNN larger than all the networks in the ensemble together.

有偏 · 可約的 · 約束 · GROUP · CASE ·

2022 年 6 月 7 日

Selection in the Presence of Implicit Bias: The Advantage of Intersectional Constraints

Anay Mehrotra,Bary S. R. Pradelski,Nisheeth K. Vishnoi

from arxiv, This is the full version of a paper accepted for presentation in ACM FAccT 2022

In selection processes such as hiring, promotion, and college admissions, implicit bias toward socially-salient attributes such as race, gender, or sexual orientation of candidates is known to produce persistent inequality and reduce aggregate utility for the decision maker. Interventions such as the Rooney Rule and its generalizations, which require the decision maker to select at least a specified number of individuals from each affected group, have been proposed to mitigate the adverse effects of implicit bias in selection. Recent works have established that such lower-bound constraints can be very effective in improving aggregate utility in the case when each individual belongs to at most one affected group. However, in several settings, individuals may belong to multiple affected groups and, consequently, face more extreme implicit bias due to this intersectionality. We consider independently drawn utilities and show that, in the intersectional case, the aforementioned non-intersectional constraints can only recover part of the total utility achievable in the absence of implicit bias. On the other hand, we show that if one includes appropriate lower-bound constraints on the intersections, almost all the utility achievable in the absence of implicit bias can be recovered. Thus, intersectional constraints can offer a significant advantage over a reductionist dimension-by-dimension non-intersectional approach to reducing inequality.

估計/估計量 · 可約的 · 數據點 · 有偏 · 閾值 ·

2022 年 6 月 7 日

Histogram Estimation under User-level Privacy with Heterogeneous Data

Yuhan Liu,Ananda Theertha Suresh,Wennan Zhu,Peter Kairouz,Marco Gruteser

from arxiv, 21 pages, 4 figures

We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. While there is abundant literature on this classical problem under the item-level privacy setup where each user contributes only one data point, little has been known for the user-level counterpart. We consider the heterogeneous scenario where both the quantity and distribution of data can be different for each user. We propose an algorithm based on a clipping strategy that almost achieves a two-approximation with respect to the best clipping threshold in hindsight. This result holds without any distribution assumptions on the data. We also prove that the clipping bias can be significantly reduced when the counts are from non-i.i.d. Poisson distributions and show empirically that our debiasing method provides improvements even without such constraints. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms.

生成模型 · MoDELS · 表示學習 · 學成 · 可辨認的 ·

2021 年 6 月 9 日

Generative Models as a Data Source for Multiview Representation Learning

Ali Jahanian,Xavier Puig,Yonglong Tian,Phillip Isola

Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: if we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data. Given an off-the-shelf image generator without any access to its training data, we train representations from the samples output by this generator. We compare several representation learning methods that can be applied to this setting, using the latent space of the generator to generate multiple "views" of the same semantic content. We show that for contrastive methods, this multiview data can naturally be used to identify positive pairs (nearby in latent space) and negative pairs (far apart in latent space). We find that the resulting representations rival those learned directly from real data, but that good performance requires care in the sampling strategy applied and the training method. Generative models can be viewed as a compressed and organized copy of a dataset, and we envision a future where more and more "model zoos" proliferate while datasets become increasingly unwieldy, missing, or private. This paper suggests several techniques for dealing with visual representation learning in such a future. Code is released on our project page: //ali-design.github.io/GenRep/

異常檢測 · 生成式對抗網絡 · Networking · 判別器 · Machine Learning ·

2019 年 1 月 15 日

MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

Dan Li,Dacheng Chen,Lei Shi,Baihong Jin,Jonathan Goh,See-Kiong Ng

from arxiv, This is a pre-print of an on-going work. arXiv admin note: text overlap with arXiv:1809.04758

The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.