亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Quantiles are often used for summarizing and understanding data. If that data is sensitive, it may be necessary to compute quantiles in a way that is differentially private, providing theoretical guarantees that the result does not reveal private information. However, when multiple quantiles are needed, existing differentially private algorithms fare poorly: they either compute quantiles individually, splitting the privacy budget, or summarize the entire distribution, wasting effort. In either case the result is reduced accuracy. In this work we propose an instance of the exponential mechanism that simultaneously estimates exactly $m$ quantiles from $n$ data points while guaranteeing differential privacy. The utility function is carefully structured to allow for an efficient implementation that returns estimates of all $m$ quantiles in time $O(mn\log(n) + m^2n)$. Experiments show that our method significantly outperforms the current state of the art on both real and synthetic data while remaining efficient enough to be practical.

相關內容

We study differentially private stochastic optimization in convex and non-convex settings. For the convex case, we focus on the family of non-smooth generalized linear losses (GLLs). Our algorithm for the $\ell_2$ setting achieves optimal excess population risk in near-linear time, while the best known differentially private algorithms for general convex losses run in super-linear time. Our algorithm for the $\ell_1$ setting has nearly-optimal excess population risk $\tilde{O}\big(\sqrt{\frac{\log{d}}{n\varepsilon}}\big)$, and circumvents the dimension dependent lower bound of \cite{Asi:2021} for general non-smooth convex losses. In the differentially private non-convex setting, we provide several new algorithms for approximating stationary points of the population risk. For the $\ell_1$-case with smooth losses and polyhedral constraint, we provide the first nearly dimension independent rate, $\tilde O\big(\frac{\log^{2/3}{d}}{{(n\varepsilon)^{1/3}}}\big)$ in linear time. For the constrained $\ell_2$-case with smooth losses, we obtain a linear-time algorithm with rate $\tilde O\big(\frac{1}{n^{1/3}}+\frac{d^{1/5}}{(n\varepsilon)^{2/5}}\big)$. Finally, for the $\ell_2$-case we provide the first method for {\em non-smooth weakly convex} stochastic optimization with rate $\tilde O\big(\frac{1}{n^{1/4}}+\frac{d^{1/6}}{(n\varepsilon)^{1/3}}\big)$ which matches the best existing non-private algorithm when $d= O(\sqrt{n})$. We also extend all our results above for the non-convex $\ell_2$ setting to the $\ell_p$ setting, where $1 < p \leq 2$, with only polylogarithmic (in the dimension) overhead in the rates.

Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we break this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in two simulations and on three real-world benchmarks.

This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under local differential privacy (LDP). Contrary to frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. Besides, when collecting user statistics longitudinally, privacy progressively degrades. Indeed, both "multiple" settings combined (i.e., many attributes and several collections throughout time) imposes several challenges, in which this paper proposes the first solution for frequency estimates under LDP. To tackle these issues, we extend the analysis of three state-of-the-art LDP protocols (Generalized Randomized Response -- GRR, Optimized Unary Encoding -- OUE, and Symmetric Unary Encoding -- SUE) for both longitudinal and multidimensional data collections. While the known literature uses OUE and SUE for two rounds of sanitization (a.k.a. memoization), i.e., L-OUE and L-SUE, respectively, we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility (i.e., L-OSUE). Also, for attributes with small domain sizes, we propose longitudinal GRR (L-GRR), which provides higher utility than the other protocols based on unary encoding. Lastly, we also propose a new solution named \underline{A}daptive \underline{L}DP for \underline{LO}ngitudinal and \underline{M}ultidimensional \underline{FRE}quency \underline{E}stimates (ALLOMFREE), which randomly samples a single attribute to send with the whole privacy budget and adaptively selects the optimal protocol, i.e., either L-GRR or L-OSUE. As shown in the results, ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimations.

We present an asymptotically optimal $(\epsilon,\delta)$ differentially private mechanism for answering multiple, adaptively asked, $\Delta$-sensitive queries, settling the conjecture of Steinke and Ullman [2020]. Our algorithm has a significant advantage that it adds independent bounded noise to each query, thus providing an absolute error bound. Additionally, we apply our algorithm in adaptive data analysis, obtaining an improved guarantee for answering multiple queries regarding some underlying distribution using a finite sample. Numerical computations show that the bounded-noise mechanism outperforms the Gaussian mechanism in many standard settings.

We present practical aspects of implementing a pseudo posterior synthesizer for microdata dissemination under a new re-weighting strategy for utility maximization of released synthetic data. Our re-weighting strategy applies to any vector-weighting approach under which a vector of observation-indexed weight are used to downweight likelihood contributions for high disclosure risk records. We demonstrate our method on two different vector-weighted schemes that target high-risk records by exponentiating each of their likelihood contributions with a record-indexed weight, $\alpha_i \in [0,1]$ for record $i \in (1,\ldots,n)$. We compute the overall Lipschitz bound, $\Delta_{\boldsymbol{\alpha},\mathbf{x}}$, for the database $\mathbf{x}$, under each vector-weighted scheme where a local $\epsilon_{x} = 2\Delta_{\boldsymbol{\alpha},\mathbf{x}}$. Our new method for constructing record-indexed downeighting maximizes the data utility under any privacy budget for the vector-weighted synthesizers by adjusting the by-record weights, $(\alpha_{i})_{i = 1}^{n}$, such that their individual Lipschitz bounds, $\Delta_{\boldsymbol{\alpha},x_{i}}$, approach the bound for the entire database, $\Delta_{\boldsymbol{\alpha},\mathbf{x}}$. Our method asymptotically (as sample size grows) achieves an $(\epsilon = 2 \Delta_{\boldsymbol{\alpha}})-$ differential privacy (DP) guarantee, globally, over the space of databases, $\mathbf{x}\in\mathcal{X}$. We illustrate our methods using simulated count data with and without over-dispersion-induced skewness and compare the results to a scalar-weighted synthesizer under the Exponential Mechanism. We demonstrate our asymptotic DP result in a simulation study. We apply our methods to a sample of the Survey of Doctorate Recipients.

In this note, we consider the problem of differentially privately (DP) computing an anonymized histogram, which is defined as the multiset of counts of the input dataset (without bucket labels). In the low-privacy regime $\epsilon \geq 1$, we give an $\epsilon$-DP algorithm with an expected $\ell_1$-error bound of $O(\sqrt{n} / e^\epsilon)$. In the high-privacy regime $\epsilon < 1$, we give an $\Omega(\sqrt{n \log(1/\epsilon) / \epsilon})$ lower bound on the expected $\ell_1$ error. In both cases, our bounds asymptotically match the previously known lower/upper bounds due to [Suresh, NeurIPS 2019].

Large organizations that collect data about populations (like the US Census Bureau) release summary statistics that are used by multiple stakeholders for resource allocation and policy making problems. These organizations are also legally required to protect the privacy of individuals from whom they collect data. Differential Privacy (DP) provides a solution to release useful summary data while preserving privacy. Most DP mechanisms are designed to answer a single set of queries. In reality, there are often multiple stakeholders that use a given data release and have overlapping but not-identical queries. This introduces a novel joint optimization problem in DP where the privacy budget must be shared among different analysts. We initiate study into the problem of DP query answering across multiple analysts. To capture the competing goals and priorities of multiple analysts, we formulate three desiderata that any mechanism should satisfy in this setting -- The Sharing Incentive, Non-Interference, and Adaptivity -- while still optimizing for overall error. We demonstrate how existing DP query answering mechanisms in the multi-analyst settings fail to satisfy at least one of the desiderata. We present novel DP algorithms that provably satisfy all our desiderata and empirically show that they incur low error on realistic tasks.

Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by *clipping* it to some constant value. However there is no good *a priori* setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.

Knowledge graph embedding plays an important role in knowledge representation, reasoning, and data mining applications. However, for multiple cross-domain knowledge graphs, state-of-the-art embedding models cannot make full use of the data from different knowledge domains while preserving the privacy of exchanged data. In addition, the centralized embedding model may not scale to the extensive real-world knowledge graphs. Therefore, we propose a novel decentralized scalable learning framework, \emph{Federated Knowledge Graphs Embedding} (FKGE), where embeddings from different knowledge graphs can be learnt in an asynchronous and peer-to-peer manner while being privacy-preserving. FKGE exploits adversarial generation between pairs of knowledge graphs to translate identical entities and relations of different domains into near embedding spaces. In order to protect the privacy of the training data, FKGE further implements a privacy-preserving neural network structure to guarantee no raw data leakage. We conduct extensive experiments to evaluate FKGE on 11 knowledge graphs, demonstrating a significant and consistent improvement in model quality with at most 17.85\% and 7.90\% increases in performance on triple classification and link prediction tasks.

Alternating Direction Method of Multipliers (ADMM) is a widely used tool for machine learning in distributed settings, where a machine learning model is trained over distributed data sources through an interactive process of local computation and message passing. Such an iterative process could cause privacy concerns of data owners. The goal of this paper is to provide differential privacy for ADMM-based distributed machine learning. Prior approaches on differentially private ADMM exhibit low utility under high privacy guarantee and often assume the objective functions of the learning problems to be smooth and strongly convex. To address these concerns, we propose a novel differentially private ADMM-based distributed learning algorithm called DP-ADMM, which combines an approximate augmented Lagrangian function with time-varying Gaussian noise addition in the iterative process to achieve higher utility for general objective functions under the same differential privacy guarantee. We also apply the moments accountant method to bound the end-to-end privacy loss. The theoretical analysis shows that DP-ADMM can be applied to a wider class of distributed learning problems, is provably convergent, and offers an explicit utility-privacy tradeoff. To our knowledge, this is the first paper to provide explicit convergence and utility properties for differentially private ADMM-based distributed learning algorithms. The evaluation results demonstrate that our approach can achieve good convergence and model accuracy under high end-to-end differential privacy guarantee.

北京阿比特科技有限公司