欧美精品日韩精品国内精品,亚洲AV永久少妇精品一区在线,天堂在线最新版WWW,男人J桶进女人P无遮挡快射视频,清纯校花自慰呻吟流白浆

This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under local differential privacy (LDP). Contrary to frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. Besides, when collecting user statistics longitudinally, privacy progressively degrades. Indeed, both "multiple" settings combined (i.e., many attributes and several collections throughout time) imposes several challenges, in which this paper proposes the first solution for frequency estimates under LDP. To tackle these issues, we extend the analysis of three state-of-the-art LDP protocols (Generalized Randomized Response -- GRR, Optimized Unary Encoding -- OUE, and Symmetric Unary Encoding -- SUE) for both longitudinal and multidimensional data collections. While the known literature uses OUE and SUE for two rounds of sanitization (a.k.a. memoization), i.e., L-OUE and L-SUE, respectively, we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility (i.e., L-OSUE). Also, for attributes with small domain sizes, we propose longitudinal GRR (L-GRR), which provides higher utility than the other protocols based on unary encoding. Lastly, we also propose a new solution named \underline{A}daptive \underline{L}DP for \underline{LO}ngitudinal and \underline{M}ultidimensional \underline{FRE}quency \underline{E}stimates (ALLOMFREE), which randomly samples a single attribute to send with the whole privacy budget and adaptively selects the optimal protocol, i.e., either L-GRR or L-OSUE. As shown in the results, ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimations.

相關內容

估計/估計量

關注 3

估計/估計量 · 流 · Continuity · CASE · BASIC ·

2022 年 1 月 10 日

Bounded Space Differentially Private Quantiles

Daniel Alabi,Omri Ben-Eliezer,Anamay Chaturvedi

Estimating the quantiles of a large dataset is a fundamental problem in both the streaming algorithms literature and the differential privacy literature. However, all existing private mechanisms for distribution-independent quantile computation require space at least linear in the input size $n$. In this work, we devise a differentially private algorithm for the quantile estimation problem, with strongly sublinear space complexity, in the one-shot and continual observation settings. Our basic mechanism estimates any $\alpha$-approximate quantile of a length-$n$ stream over a data universe $\mathcal{X}$ with probability $1-\beta$ using $O\left( \frac{\log (|\mathcal{X}|/\beta) \log (\alpha \epsilon n)}{\alpha \epsilon} \right)$ space while satisfying $\epsilon$-differential privacy at a single time point. Our approach builds upon deterministic streaming algorithms for non-private quantile estimation instantiating the exponential mechanism using a utility function defined on sketch items, while (privately) sampling from intervals defined by the sketch. We also present another algorithm based on histograms that is especially suited to the multiple quantiles case. We implement our algorithms and experimentally evaluate them on synthetic and real-world datasets.

估計/估計量 · 優化器 · 數據獲取 · 可交換的 · 情景 ·

2022 年 1 月 10 日

Optimal and Differentially Private Data Acquisition: Central and Local Mechanisms

Alireza Fallah,Ali Makhdoumi,Azarakhsh Malekian,Asuman Ozdaglar

We consider a platform's problem of collecting data from privacy sensitive users to estimate an underlying parameter of interest. We formulate this question as a Bayesian-optimal mechanism design problem, in which an individual can share her (verifiable) data in exchange for a monetary reward or services, but at the same time has a (private) heterogeneous privacy cost which we quantify using differential privacy. We consider two popular differential privacy settings for providing privacy guarantees for the users: central and local. In both settings, we establish minimax lower bounds for the estimation error and derive (near) optimal estimators for given heterogeneous privacy loss levels for users. Building on this characterization, we pose the mechanism design problem as the optimal selection of an estimator and payments that will elicit truthful reporting of users' privacy sensitivities. Under a regularity condition on the distribution of privacy sensitivities we develop efficient algorithmic mechanisms to solve this problem in both privacy settings. Our mechanism in the central setting can be implemented in time $\mathcal{O}(n \log n)$ where $n$ is the number of users and our mechanism in the local setting admits a Polynomial Time Approximation Scheme (PTAS).

MINE · Processing（編程語言） · CASES · state-of-the-art · 閾值 ·

2022 年 1 月 9 日

Differentially Private Release of Event Logs for Process Mining

Gamal Elkoumy,Alisa Pankova,Marlon Dumas

from arxiv, arXiv admin note: text overlap with arXiv:2103.11739

The applicability of process mining techniques hinges on the availability of event logs capturing the execution of a business process. In some use cases, particularly those involving customer-facing processes, these event logs may contain private information. Data protection regulations restrict the use of such event logs for analysis purposes. One way of circumventing these restrictions is to anonymize the event log to the extent that no individual can be singled out using the anonymized log. This article addresses the problem of anonymizing an event log in order to guarantee that, upon release of the anonymized log, the probability that an attacker may single out any individual represented in the original log does not increase by more than a threshold. The article proposes a differentially private release mechanism, which samples the cases in the log and adds noise to the timestamps to the extent required to achieve the above privacy guarantee. The article reports on an empirical comparison of the proposed approach against the state-of-the-art approaches using 14 real-life event logs in terms of data utility loss and computational efficiency.

條件獨立的 · MoDELS · 相互獨立的 · Extensibility · Processing（編程語言） ·

2022 年 1 月 6 日

A Gaussian copula joint model for longitudinal and time-to-event data with random effects

Zili Zhang,Christiana Charalambous,Peter Foster

from arxiv, 31 pages, 4 figures, do a fast presentation on 2021 Joint Statistical Meeting

Longitudinal and survival sub-models are two building blocks for joint modelling of longitudinal and time to event data. Extensive research indicates separate analysis of these two processes could result in biased outputs due to their associations. Conditional independence between measurements of biomarkers and event time process given latent classes or random effects is a common approach for characterising the association between the two sub-models while taking the heterogeneity among the population into account. However, this assumption is tricky to validate because of the unobservable latent variables. Thus a Gaussian copula joint model with random effects is proposed to accommodate the scenarios where the conditional independence assumption is questionable. In our proposed model, the conventional joint model assuming conditional independence is a special case when the association parameter in the Gaussian copula shrinks to zero. Simulation studies and real data application are carried out to evaluate the performance of our proposed model. In addition, personalised dynamic predictions of survival probabilities are obtained based on the proposed model and comparisons are made to the predictions obtained under the conventional joint model.

離散化 · 估計/估計量 · 有限差分 · 近似 · AIM ·

2022 年 1 月 5 日

Identification of potential in diffusion equations from terminal observation: analysis and discrete approximation

Zhengqi Zhang,Zhidong Zhang,Zhi Zhou

from arxiv, 23 pages, 21 Figures

The aim of this paper is to study the recovery of a spatially dependent potential in a (sub)diffusion equation from overposed final time data. We construct a monotone operator one of whose fixed points is the unknown potential. The uniqueness of the identification is theoretically verified by using the monotonicity of the operator and a fixed point argument. Moreover, we show a conditional stability in Hilbert spaces under some suitable conditions on the problem data. Next, a completely discrete scheme is developed, by using Galerkin finite element method in space and finite difference method in time, and then a fixed point iteration is applied to reconstruct the potential. We prove the linear convergence of the iterative algorithm by the contraction mapping theorem, and present a thorough error analysis for the reconstructed potential. Our derived \textsl{a priori} error estimate provides a guideline to choose discretization parameters according to the noise level. The analysis relies heavily on some suitable nonstandard error estimates for the direct problem as well as the aforementioned conditional stability. Numerical experiments are provided to illustrate and complement our theoretical analysis.

去噪 · 損失函數（機器學習） · 極小點 · state-of-the-art · GANs ·

2022 年 1 月 5 日

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Chenlin Meng,Yutong He,Yang Song,Jiaming Song,Jiajun Wu,Jun-Yan Zhu,Stefano Ermon

from arxiv, //sde-image-editing.github.io/

Guided image synthesis enables everyday users to create and edit photo-realistic images with minimum effort. The key challenge is balancing faithfulness to the user input (e.g., hand-drawn colored strokes) and realism of the synthesized image. Existing GAN-based methods attempt to achieve such balance using either conditional GANs or GAN inversions, which are challenging and often require additional training data or loss functions for individual applications. To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE). Given an input image with user guide of any type, SDEdit first adds noise to the input, then subsequently denoises the resulting image through the SDE prior to increase its realism. SDEdit does not require task-specific training or inversions and can naturally achieve the balance between realism and faithfulness. SDEdit significantly outperforms state-of-the-art GAN-based methods by up to 98.09% on realism and 91.72% on overall satisfaction scores, according to a human perception study, on multiple tasks, including stroke-based image synthesis and editing as well as image compositing.

估計/估計量 · 可辨認的 · GROUP · binary · 正則化項 ·

2022 年 1 月 4 日

Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis

Max Goplerud,Kosuke Imai,Nicole E. Pashley

from arxiv, 31 pages (main text); 20 pages (supplementary information)

Estimation of heterogeneous treatment effects is an active area of research in causal inference. Most of the existing methods, however, focus on estimating the conditional average treatment effects of a single, binary treatment given a set of pre-treatment covariates. In this paper, we propose a method to estimate the heterogeneous causal effects of high-dimensional treatments, which poses unique challenges in terms of estimation and interpretation. The proposed approach is based on a Bayesian mixture of regularized regressions to identify groups of units who exhibit similar patterns of treatment effects. By directly modeling cluster membership with covariates, the proposed methodology allows one to explore the unit characteristics that are associated with different patterns of treatment effects. Our motivating application is conjoint analysis, which is a popular survey experiment in social science and marketing research and is based on a high-dimensional factorial design. We apply the proposed methodology to the conjoint data, where survey respondents are asked to select one of two immigrant profiles with randomly selected attributes. We find that a group of respondents with a relatively high degree of prejudice appears to discriminate against immigrants from non-European countries like Iraq. An open-source software package is available for implementing the proposed methodology.

情景 · 服務器 · binary · 學成 · TOOLS ·

2022 年 1 月 4 日

Lightweight Techniques for Private Heavy Hitters

Dan Boneh,Elette Boyle,Henry Corrigan-Gibbs,Niv Gilboa,Yuval Ishai

from arxiv, Appeared at IEEE Security & Privacy 2021

This paper presents Poplar, a new system for solving the private heavy-hitters problem. In this problem, there are many clients and a small set of data-collection servers. Each client holds a private bitstring. The servers want to recover the set of all popular strings, without learning anything else about any client's string. A web-browser vendor, for instance, can use Poplar to figure out which homepages are popular, without learning any user's homepage. We also consider the simpler private subset-histogram problem, in which the servers want to count how many clients hold strings in a particular set without revealing this set to the clients. Poplar uses two data-collection servers and, in a protocol run, each client send sends only a single message to the servers. Poplar protects client privacy against arbitrary misbehavior by one of the servers and our approach requires no public-key cryptography (except for secure channels), nor general-purpose multiparty computation. Instead, we rely on incremental distributed point functions, a new cryptographic tool that allows a client to succinctly secret-share the labels on the nodes of an exponentially large binary tree, provided that the tree has a single non-zero path. Along the way, we develop new general tools for providing malicious security in applications of distributed point functions.

INFORMS · entity · Processing（編程語言） · 散度 · 聯邦學習 ·

2022 年 1 月 4 日

Semantics-Preserved Distortion for Personal Privacy Protection

Letian Peng,Zuchao Li,Hai Zhao

Privacy protection is an important and concerning topic in Federated Learning, especially for Natural Language Processing. In client devices, a large number of texts containing personal information are produced by users every day. As the direct application of information from users is likely to invade personal privacy, many methods have been proposed in Federated Learning to block the center model from the raw information in client devices. In this paper, we try to do this more linguistically via distorting the text while preserving the semantics. In practice, we leverage a recently proposed metric, Neighboring Distribution Divergence, to evaluate the semantic preservation during the distortion. Based on the metric, we propose two frameworks for semantics-preserved distortion, a generative one and a substitutive one. Due to the lack of privacy-related tasks in the current Natural Language Processing field, we conduct experiments on named entity recognition and constituency parsing. Results from our experiments show the plausibility and efficiency of our distortion as a method for personal privacy protection.

流形 · 近似 · 數據點 · 線性的 · 維數災難 ·

2019 年 3 月 7 日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Barak Sober,David Levin

In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.