亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Differential privacy (DP) quantifies privacy loss by analyzing noise injected into output statistics. For non-trivial statistics, this noise is necessary to ensure finite privacy loss. However, data curators frequently release collections of statistics where some use DP mechanisms and others are released as-is, i.e., without additional randomized noise. Consequently, DP alone cannot characterize the privacy loss attributable to the entire collection of releases. In this paper, we present a privacy formalism, $(\epsilon, \{ \Theta_z\}_{z \in \mathcal{Z}})$-Pufferfish ($\epsilon$-TP for short when $\{ \Theta_z\}_{z \in \mathcal{Z}}$ is implied), a collection of Pufferfish mechanisms indexed by realizations of a random variable $Z$ representing public information not protected with DP noise. First, we prove that this definition has similar properties to DP. Next, we introduce mechanisms for releasing partially private data (PPD) satisfying $\epsilon$-TP and prove their desirable properties. We provide algorithms for sampling from the posterior of a parameter given PPD. We then compare this inference approach to the alternative where noisy statistics are deterministically combined with Z. We derive mild conditions under which using our algorithms offers both theoretical and computational improvements over this more common approach. Finally, we demonstrate all the effects above on a case study on COVID-19 data.

相關內容

To prevent implicit privacy disclosure in sharing gradients among data owners (DOs) under federated learning (FL), differential privacy (DP) and its variants have become a common practice to offer formal privacy guarantees with low overheads. However, individual DOs generally tend to inject larger DP noises for stronger privacy provisions (which entails severe degradation of model utility), while the curator (i.e., aggregation server) aims to minimize the overall effect of added random noises for satisfactory model performance. To address this conflicting goal, we propose a novel dynamic privacy pricing (DyPP) game which allows DOs to sell individual privacy (by lowering the scale of locally added DP noise) for differentiated economic compensations (offered by the curator), thereby enhancing FL model utility. Considering multi-dimensional information asymmetry among players (e.g., DO's data distribution and privacy preference, and curator's maximum affordable payment) as well as their varying private information in distinct FL tasks, it is hard to directly attain the Nash equilibrium of the mixed-strategy DyPP game. Alternatively, we devise a fast reinforcement learning algorithm with two layers to quickly learn the optimal mixed noise-saving strategy of DOs and the optimal mixed pricing strategy of the curator without prior knowledge of players' private information. Experiments on real datasets validate the feasibility and effectiveness of the proposed scheme in terms of faster convergence speed and enhanced FL model utility with lower payment costs.

We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between $P\mathsf{K}$ and $Q\mathsf{K}$ output distributions of an $\varepsilon$-LDP mechanism $\mathsf{K}$ in terms of a divergence between the corresponding input distributions $P$ and $Q$, respectively. Our first main technical result presents a sharp upper bound on the $\chi^2$-divergence $\chi^2(P\mathsf{K}\|Q\mathsf{K})$ in terms of $\chi^2(P\|Q)$ and $\varepsilon$. We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on $\chi^2(P\mathsf{K}\|Q\mathsf{K})$ in terms of total variation distance $\mathsf{TV}(P, Q)$ and $\varepsilon$. We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam's, Assouad's, and the mutual information methods, which are powerful tools for bounding minimax estimation risks. These results are shown to lead to better privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing.

This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points. For instance, this task is of key importance in private data generation. We present two different approaches. The first one consists in privately estimating the empirical quantiles of the samples and using this result as an estimator of the quantiles of the distribution. In particular, we study the statistical properties of the recently published algorithm introduced by Kaplan et al. 2022 that privately estimates the quantiles recursively. The second approach is to use techniques of density estimation in order to uniformly estimate the quantile function on an interval. In particular, we show that there is a tradeoff between the two methods. When we want to estimate many quantiles, it is better to estimate the density rather than estimating the quantile function at specific points.

Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative study on the impact of label noise on FL. We derive an upper bound for the generalization error that is linear in the clients' label noise level. Then we conduct experiments on MNIST and CIFAR-10 datasets using various FL algorithms. Our empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with our theoretical analysis. We further find that label noise slows down the convergence of FL training, and the global model tends to overfit when the noise level is high.

We study the space complexity of the two related fields of differential privacy and adaptive data analysis. Specifically, (1) Under standard cryptographic assumptions, we show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy, compared to the space needed without privacy. To the best of our knowledge, this is the first separation between the space complexity of private and non-private algorithms. (2) The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries. We revisit previous lower bounds at a foundational level, and show that they are a consequence of a space bottleneck rather than a sampling bottleneck. To obtain our results, we define and construct an encryption scheme with multiple keys that is built to withstand a limited amount of key leakage in a very particular way.

Making sure that users understand privacy policies that impact them is a key challenge for a real GDPR deployment. Research studies are mostly carried in English, but in Europe and elsewhere, users speak a language that is not English. Replicating studies in different languages requires the availability of comparable cross-language privacy policies corpora. This work provides a methodology for building comparable cross-language in a national language and a reference study language. We provide an application example of our methodology comparing English and Italian extending the corpus of one of the first studies about users understanding of technical terms in privacy policies. We also investigate other open issues that can make replication harder.

The proliferation of smartphone devices has led to the emergence of powerful user services from enabling interactions with friends and business associates to mapping, finding nearby businesses and alerting users in real-time. Moreover, users do not realize that continuously sharing their trajectory data with online systems may end up revealing a great amount of information in terms of their behavior, mobility patterns and social relationships. Thus, addressing these privacy risks is a fundamental challenge. In this work, we present $TP^3$, a Privacy Protection system for Trajectory analytics. Our contributions are the following: (1) we model a new type of attack, namely 'social link exploitation attack', (2) we utilize the coresets theory, a fast and accurate technique which approximates well the original data using a small data set, and running queries on the coreset produces similar results to the original data, and (3) we employ the Serverless computing paradigm to accommodate a set of privacy operations for achieving high system performance with minimized provisioning costs, while preserving the users' privacy. We have developed these techniques in our $TP^3$ system that works with state-of-the-art trajectory analytics apps and applies different types of privacy operations. Our detailed experimental evaluation illustrates that our approach is both efficient and practical.

We investigate the contraction coefficients derived from strong data processing inequalities for the $E_\gamma$-divergence. By generalizing the celebrated Dobrushin's coefficient from total variation distance to $E_\gamma$-divergence, we derive a closed-form expression for the contraction of $E_\gamma$-divergence. This result has fundamental consequences in two privacy settings. First, it implies that local differential privacy can be equivalently expressed in terms of the contraction of $E_\gamma$-divergence. This equivalent formula can be used to precisely quantify the impact of local privacy in (Bayesian and minimax) estimation and hypothesis testing problems in terms of the reduction of effective sample size. Second, it leads to a new information-theoretic technique for analyzing privacy guarantees of online algorithms. In this technique, we view such algorithms as a composition of amplitude-constrained Gaussian channels and then relate their contraction coefficients under $E_\gamma$-divergence to the overall differential privacy guarantees. As an example, we apply our technique to derive the differential privacy parameters of gradient descent. Moreover, we also show that this framework can be tailored to batch learning algorithms that can be implemented with one pass over the training dataset.

In statistical inference, uncertainty is unknown and all models are wrong. That is to say, a person who makes a statistical model and a prior distribution is simultaneously aware that both are fictional candidates. To study such cases, statistical measures have been constructed, such as cross validation, information criteria, and marginal likelihood, however, their mathematical properties have not yet been completely clarified when statistical models are under- and over- parametrized. We introduce a place of mathematical theory of Bayesian statistics for unknown uncertainty, which clarifies general properties of cross validation, information criteria, and marginal likelihood, even if an unknown data-generating process is unrealizable by a model or even if the posterior distribution cannot be approximated by any normal distribution. Hence it gives a helpful standpoint for a person who cannot believe in any specific model and prior. This paper consists of three parts. The first is a new result, whereas the second and third are well-known previous results with new experiments. We show there exists a more precise estimator of the generalization loss than leave-one-out cross validation, there exists a more accurate approximation of marginal likelihood than BIC, and the optimal hyperparameters for generalization loss and marginal likelihood are different.

Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private selection/test frameworks, including the work by Liu and Talwar (STOC 2019), and Papernot and Steinke (ICLR 2022). In this work, we first present an alternative framework for private selection and testing with a simpler privacy proof and equally-good utility guarantee. Second, we observe that the private selection framework (both previous ones and ours) can be applied to improve the accuracy/confidence trade-off for many fundamental privacy-preserving data-analysis tasks, including query releasing, top-$k$ selection, and stable selection. Finally, for online settings, we apply the private testing to design a mechanism for adaptive query releasing, which improves the sample complexity dependence on the confidence parameter for the celebrated private multiplicative weights algorithm of Hardt and Rothblum (FOCS 2010).

北京阿比特科技有限公司