成年人日屄视频免费观看-无码人妻丰满熟妇A片护士M

Although robust learning and local differential privacy are both widely studied fields of research, combining the two settings is just starting to be explored. We consider the problem of estimating a discrete distribution in total variation from $n$ contaminated data batches under a local differential privacy constraint. A fraction $1-\epsilon$ of the batches contain $k$ i.i.d. samples drawn from a discrete distribution $p$ over $d$ elements. To protect the users' privacy, each of the samples is privatized using an $\alpha$-locally differentially private mechanism. The remaining $\epsilon n $ batches are an adversarial contamination. The minimax rate of estimation under contamination alone, with no privacy, is known to be $\epsilon/\sqrt{k}+\sqrt{d/kn}$, up to a $\sqrt{\log(1/\epsilon)}$ factor. Under the privacy constraint alone, the minimax rate of estimation is $\sqrt{d^2/\alpha^2 kn}$. We show that combining the two constraints leads to a minimax estimation rate of $\epsilon\sqrt{d/\alpha^2 k}+\sqrt{d^2/\alpha^2 kn}$ up to a $\sqrt{\log(1/\epsilon)}$ factor, larger than the sum of the two separate rates. We provide a polynomial-time algorithm achieving this bound, as well as a matching information theoretic lower bound.

相關內容

估計/估計量

關注 3

置信度 · 可約的 · 估計/估計量 · 貝葉斯估計 · 假陰性 ·

2022 年 6 月 10 日

Bayesian Estimation of Differential Privacy

Santiago Zanella-Béguelin,Lukas Wutschitz,Shruti Tople,Ahmed Salem,Victor Rühle,Andrew Paverd,Mohammad Naseri,Boris K?pf

from arxiv, 17 pages, 8 figures. Joint main authors: Santiago Zanella-B\'eguelin, Lukas Wutschitz, and Shruti Tople

Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the privacy budget $\varepsilon$ spent on training a model. Existing approaches derive confidence intervals for $\varepsilon$ from confidence intervals for the false positive and false negative rates of membership inference attacks. Unfortunately, obtaining narrow high-confidence intervals for $\epsilon$ using this method requires an impractically large sample size and training as many models as samples. We propose a novel Bayesian method that greatly reduces sample size, and adapt and validate a heuristic to draw more than one sample per trained model. Our Bayesian method exploits the hypothesis testing interpretation of differential privacy to obtain a posterior for $\varepsilon$ (not just a confidence interval) from the joint posterior of the false positive and false negative rates of membership inference attacks. For the same sample size and confidence, we derive confidence intervals for $\varepsilon$ around 40% narrower than prior work. The heuristic, which we adapt from label-only DP, can be used to further reduce the number of trained models needed to get enough samples by up to 2 orders of magnitude.

小批量 · 優化器 · Extensibility · 小批量隨機 · 不變 ·

2022 年 6 月 10 日

On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond

Xiao-Tong Yuan,Ping Li

The FedProx algorithm is a simple yet powerful distributed proximal point optimization method widely used for federated learning (FL) over heterogeneous data. Despite its popularity and remarkable success witnessed in practice, the theoretical understanding of FedProx is largely underinvestigated: the appealing convergence behavior of FedProx is so far characterized under certain non-standard and unrealistic dissimilarity assumptions of local functions, and the results are limited to smooth optimization problems. In order to remedy these deficiencies, we develop a novel local dissimilarity invariant convergence theory for FedProx and its minibatch stochastic extension through the lens of algorithmic stability. As a result, we contribute to derive several new and deeper insights into FedProx for non-convex federated optimization including: 1) convergence guarantees independent on local dissimilarity type conditions; 2) convergence guarantees for non-smooth FL problems; and 3) linear speedup with respect to size of minibatch and number of sampled devices. Our theory for the first time reveals that local dissimilarity and smoothness are not must-have for FedProx to get favorable complexity bounds. Preliminary experimental results on a series of benchmark FL datasets are reported to demonstrate the benefit of minibatching for improving the sample efficiency of FedProx.

Learning · MCMC · 蒙特卡羅 · 頻率主義學派 · Analysis ·

2022 年 6 月 9 日

Wireless Federated Langevin Monte Carlo: Repurposing Channel Noise for Bayesian Sampling and Privacy

Dongzhu Liu,Osvaldo Simeone

from arxiv, submitted

Most works on federated learning (FL) focus on the most common frequentist formulation of learning whereby the goal is minimizing the global empirical loss. Frequentist learning, however, is known to be problematic in the regime of limited data as it fails to quantify epistemic uncertainty in prediction. Bayesian learning provides a principled solution to this problem by shifting the optimization domain to the space of distribution in the model parameters. {\color{black}This paper proposes a novel mechanism for the efficient implementation of Bayesian learning in wireless systems. Specifically, we focus on a standard gradient-based Markov Chain Monte Carlo (MCMC) method, namely Langevin Monte Carlo (LMC), and we introduce a novel protocol, termed Wireless Federated LMC (WFLMC), that is able to repurpose channel noise for the double role of seed randomness for MCMC sampling and of privacy preservation.} To this end, based on the analysis of the Wasserstein distance between sample distribution and global posterior distribution under privacy and power constraints, we introduce a power allocation strategy as the solution of a convex program. The analysis identifies distinct operating regimes in which the performance of the system is power-limited, privacy-limited, or limited by the requirement of MCMC sampling. Both analytical and simulation results demonstrate that, if the channel noise is properly accounted for under suitable conditions, it can be fully repurposed for both MCMC sampling and privacy preservation, obtaining the same performance as in an ideal communication setting that is not subject to privacy constraints.

Processing（編程語言） · 近似 · Performer · 部分可觀測馬爾可夫決策過程 · Analysis ·

2022 年 6 月 9 日

On Low-Complexity Quickest Intervention of Mutated Diffusion Processes Through Local Approximation

Qining Zhang,Honghao Wei,Weina Wang,Lei Ying

We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimally chosen intervention actions. This model and the proposed algorithms can be applied to pandemic prevention (such as Covid-19) or misinformation containment. We formulate the problem as a partially observed Markov decision process (POMDP) and convert it to an MDP through the belief state of the change-point. We first propose a grid approximation approach to calculate the optimal intervention policy, whose computational complexity could be very high when the number of grids is large. In order to reduce the computational complexity, we further propose a low-complexity threshold-based policy through the analysis of the first-order approximation of the value functions in the ``local intervention'' regime. Simulation results show the low-complexity algorithm has a similar performance as the grid approximation and both perform much better than the QCD-based algorithms.

噪聲分布 · 正則的 · 噪聲 · 泛函 · 分解 ·

2022 年 6 月 9 日

Log-Concave and Multivariate Canonical Noise Distributions for Differential Privacy

Jordan Awan,Jinshuo Dong

from arxiv, 9 pages before references, 1 Figure

A canonical noise distribution (CND) is an additive mechanism designed to satisfy $f$-differential privacy ($f$-DP), without any wasted privacy budget. $f$-DP is a hypothesis testing-based formulation of privacy phrased in terms of tradeoff functions, which captures the difficulty of a hypothesis test. In this paper, we consider the existence and construction of log-concave CNDs as well as multivariate CNDs. Log-concave distributions are important to ensure that higher outputs of the mechanism correspond to higher input values, whereas multivariate noise distributions are important to ensure that a joint release of multiple outputs has a tight privacy characterization. We show that the existence and construction of CNDs for both types of problems is related to whether the tradeoff function can be decomposed by functional composition (related to group privacy) or mechanism composition. In particular, we show that pure $\epsilon$-DP cannot be decomposed in either way and that there is neither a log-concave CND nor any multivariate CND for $\epsilon$-DP. On the other hand, we show that Gaussian-DP, $(0,\delta)$-DP, and Laplace-DP each have both log-concave and multivariate CNDs.

Learning · SimPLe · 對數似然 · 組合性 · 計算成本 ·

2022 年 6 月 9 日

Analytical Composition of Differential Privacy via the Edgeworth Accountant

Hua Wang,Sheng Gao,Huanyu Zhang,Milan Shen,Weijie J. Su

Many modern machine learning algorithms are composed of simple private algorithms; thus, an increasingly important problem is to efficiently compute the overall privacy loss under composition. In this study, we introduce the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees of private algorithms. The Edgeworth Accountant starts by losslessly tracking the privacy loss under composition using the $f$-differential privacy framework, which allows us to express the privacy guarantees using privacy-loss log-likelihood ratios (PLLRs). As the name suggests, this accountant next uses the Edgeworth expansion to the upper and lower bounds the probability distribution of the sum of the PLLRs. Moreover, by relying on a technique for approximating complex distributions using simple ones, we demonstrate that the Edgeworth Accountant can be applied to the composition of any noise-addition mechanism. Owing to certain appealing features of the Edgeworth expansion, the $(\epsilon, \delta)$-differential privacy bounds offered by this accountant are non-asymptotic, with essentially no extra computational cost, as opposed to the prior approaches in, wherein the running times increase with the number of compositions. Finally, we demonstrate that our upper and lower $(\epsilon, \delta)$-differential privacy bounds are tight in federated analytics and certain regimes of training private deep learning models.

噪聲分布 · 正則的 · 噪聲 · Analysis · Performer ·

2022 年 6 月 8 日

Canonical Noise Distributions and Private Hypothesis Tests

Jordan Awan,Salil Vadhan

from arxiv, 25 pages + references and appendix. 4 figues

$f$-DP has recently been proposed as a generalization of differential privacy allowing a lossless analysis of composition, post-processing, and privacy amplification via subsampling. In the setting of $f$-DP, we propose the concept of a canonical noise distribution (CND), the first mechanism designed for an arbitrary $f$-DP guarantee. The notion of CND captures whether an additive privacy mechanism perfectly matches the privacy guarantee of a given $f$. We prove that a CND exists for any $f$-DP guarantee, and give a construction that produces a CND for any $f$. We show that private hypothesis tests are intimately related to CNDs, allowing for the release of private $p$-values at no additional privacy cost as well as the construction of uniformly most powerful (UMP) tests for binary data, within the general $f$-DP framework. We apply our techniques to the problem of difference of proportions testing, and construct a UMP unbiased (UMPU) "semi-private" test which upper bounds the performance of any $f$-DP test. Using this as a benchmark we propose a private test, based on the inversion of characteristic functions, which allows for optimal inference for the two population parameters and is nearly as powerful as the semi-private UMPU. When specialized to the case of $(\epsilon,0)$-DP, we show empirically that our proposed test is more powerful than any $(\epsilon/\sqrt 2)$-DP test and has more accurate type I errors than the classic normal approximation test.

估計/估計量 · 估計誤差 · INFORMS · Performer · 閾值 ·

2022 年 6 月 8 日

Real-time Sampling and Estimation on Random Access Channels: Age of Information and Beyond

Xingran Chen,Xinyu Liao,Shirin Saeedi Bidokhti

Efficient sampling and remote estimation are critical for a plethora of wireless-empowered applications in the Internet of Things and cyber-physical systems. Motivated by such applications, this work proposes decentralized policies for the real-time monitoring and estimation of autoregressive processes over random access channels. Two classes of policies are investigated: (i) oblivious schemes in which sampling and transmission policies are independent of the processes that are monitored, and (ii) non-oblivious schemes in which transmitters causally observe their corresponding processes for decision making. In the class of oblivious policies, we show that minimizing the expected time-average estimation error is equivalent to minimizing the expected age of information. Consequently, we prove lower and upper bounds on the minimum achievable estimation error in this class. Next, we consider non-oblivious policies and design a threshold policy, called error-based thinning, in which each source node becomes active if its instantaneous error has crossed a fixed threshold (which we optimize). Active nodes then transmit stochastically following a slotted ALOHA policy. A closed-form, approximately optimal, solution is found for the threshold as well as the resulting estimation error. It is shown that non-oblivious policies offer a multiplicative gain close to $3$ compared to oblivious policies. Moreover, it is shown that oblivious policies that use the age of information for decision making improve the state-of-the-art at least by the multiplicative factor $2$. The performance of all discussed policies is compared using simulations. The numerical comparison shows that the performance of the proposed decentralized policy is very close to that of centralized greedy scheduling.

過濾式方法 · 模型評估 · 邊緣化 · 協同過濾 · DATE ·

2022 年 6 月 8 日

One-Bit Matrix Completion with Differential Privacy

Zhengpin Li,Zheng Wei,Zengfeng Huang,Xiaojun Mao,Jian Wang

from arxiv, In this version, we have fixed some typos and updated references

As a prevailing collaborative filtering method for recommendation systems, one-bit matrix completion requires data collected by users to provide personalized service. Due to insidious attacks and unexpected inference, the release of users' data often raises serious privacy concerns. To address this issue, differential privacy(DP) has been widely used in standard matrix completion models. To date, however, little has been known about how to apply DP to achieve privacy protection in one-bit matrix completion. In this paper, we propose a unified framework for ensuring a strong privacy guarantee of one-bit matrix completion with DP. In our framework, we develop four different private perturbation mechanisms corresponding to different stages of one-bit matrix completion. For each mechanism, we design a privacy-preserving algorithm and provide a theoretical recovery error bound under the proper conditions. Numerical experiments on synthetic and real-world datasets demonstrate the effectiveness of our proposal. Compared to the one-bit matrix completion without privacy protection, our proposed mechanisms can maintain high-level privacy protection with marginal loss of completion accuracy.

估計/估計量 · 可約的 · 數據點 · 有偏 · 閾值 ·

2022 年 6 月 7 日

Histogram Estimation under User-level Privacy with Heterogeneous Data

Yuhan Liu,Ananda Theertha Suresh,Wennan Zhu,Peter Kairouz,Marco Gruteser

from arxiv, 21 pages, 4 figures

We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. While there is abundant literature on this classical problem under the item-level privacy setup where each user contributes only one data point, little has been known for the user-level counterpart. We consider the heterogeneous scenario where both the quantity and distribution of data can be different for each user. We propose an algorithm based on a clipping strategy that almost achieves a two-approximation with respect to the best clipping threshold in hindsight. This result holds without any distribution assumptions on the data. We also prove that the clipping bias can be significantly reduced when the counts are from non-i.i.d. Poisson distributions and show empirically that our debiasing method provides improvements even without such constraints. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms.