一区二区三区四区五区无码,18禁不卡无毒免费网站入口,日韩乱中文字幕精品乱码,国产精品JIZZ视频国产

Ciphertexts of an order-preserving encryption (OPE) scheme preserve the order of their corresponding plaintexts. However, OPEs are vulnerable to inference attacks that exploit this preserved order. At another end, differential privacy has become the de-facto standard for achieving data privacy. One of the most attractive properties of DP is that any post-processing (inferential) computation performed on the noisy output of a DP algorithm does not degrade its privacy guarantee. In this paper, we propose a novel differentially private order preserving encryption scheme, OP$\epsilon$. Under OP$\epsilon$, the leakage of order from the ciphertexts is differentially private. As a result, in the least, OP$\epsilon$ ensures a formal guarantee (specifically, a relaxed DP guarantee) even in the face of inference attacks. To the best of our knowledge, this is the first work to combine DP with a property-preserving encryption scheme. We demonstrate OP$\epsilon$'s practical utility in answering range queries via extensive empirical evaluation on four real-world datasets. For instance, OP$\epsilon$ misses only around $4$ in every $10K$ correct records on average for a dataset of size $\sim732K$ with an attribute of domain size $\sim18K$ and $\epsilon= 1$.

相關內容

Extensibility

關注 5

iOS 8 提供的應用間和應用跟系統的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source:

情景 · 相互獨立的 · 層 · 講稿 ·

2022 年 10 月 26 日

Ballot stuffing and participation privacy in pollsite voting

Prashant Agrawal,Abhinav Nakarmi,Mahabir Prasad Jhanwar,Subodh Sharma,Subhashis Banerjee

We study the problem of simultaneously addressing both ballot stuffing and participation privacy for pollsite voting systems. Ballot stuffing is the attack where fake ballots (not cast by any eligible voter) are inserted into the system. Participation privacy is about hiding which eligible voters have actually cast their vote. So far, the combination of ballot stuffing and participation privacy has been mostly studied for internet voting, where voters are assumed to own trusted computing devices. Such approaches are inapplicable to pollsite voting where voters typically vote bare handed. We present an eligibility audit protocol to detect ballot stuffing in pollsite voting protocols. This is done while protecting participation privacy from a remote observer - one who does not physically observe voters during voting. Our protocol can be instantiated as an additional layer on top of most existing pollsite E2E-V voting protocols. To achieve our guarantees, we develop an efficient zero-knowledge proof (ZKP), that, given a value $v$ and a set $\Phi$ of commitments, proves $v$ is committed by some commitment in $\Phi$, without revealing which one. We call this a ZKP of reverse set membership because of its relationship to the popular ZKPs of set membership. This ZKP may be of independent interest.

SGD · 重要性采樣 · Learning · 優化器 · 樣本 ·

2022 年 10 月 26 日

DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling

Jianxin Wei,Ergute Bao,Xiaokui Xiao,Yin Yang

from arxiv, A short version of this paper will appear in CCS 2022

Nowadays, differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. The combination of these two techniques, i.e., deep learning with differential privacy, promises the privacy-preserving release of high-utility models trained with sensitive data such as medical records. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the stochastic gradient descent (SGD) optimizer commonly used for DNN training. Subsequent approaches have improved various aspects of the model training process, including noise decay schedule, model architecture, feature engineering, and hyperparameter tuning. However, the core mechanism for enforcing DP in the SGD optimizer remains unchanged ever since the original DP-SGD algorithm, which has increasingly become a fundamental barrier limiting the performance of DP-compliant machine learning solutions. Motivated by this, we propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core optimizer of DP-SGD, with consistent and significant accuracy gains over the latter. The main idea is to employ importance sampling (IS) in each SGD iteration for mini-batch selection, which reduces both sampling variance and the amount of random noise injected to the gradients that is required to satisfy DP. Integrating IS into the complex mathematical machinery of DP-SGD is highly non-trivial. DPIS addresses the challenge through novel mechanism designs, fine-grained privacy analysis, efficiency enhancements, and an adaptive gradient clipping optimization. Extensive experiments on four benchmark datasets, namely MNIST, FMNIST, CIFAR-10 and IMDb, demonstrate the superior effectiveness of DPIS over existing solutions for deep learning with differential privacy.

Learning · MoDELS · Performer · 可辨認的 · Subspace ·

2022 年 10 月 26 日

When Does Differentially Private Learning Not Suffer in High Dimensions?

Xuechen Li,Daogao Liu,Tatsunori Hashimoto,Huseyin A. Inan,Janardhan Kulkarni,Yin Tat Lee,Abhradeep Guha Thakurta

from arxiv, 26 pages; v3 includes additional experiments and clarification

Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term \emph{restricted Lipschitz continuity} and derive improved bounds for the excess empirical and population risks that are dimension-independent under additional conditions. We empirically show that in private fine-tuning of large language models, gradients obtained during fine-tuning are mostly controlled by a few principal components. This behavior is similar to conditions under which we obtain dimension-independent bounds in convex settings. Our theoretical and empirical results together provide a possible explanation for recent successes in large-scale private fine-tuning. Code to reproduce our results can be found at \url{//github.com/lxuechen/private-transformers/tree/main/examples/classification/spectral_analysis}.

SimPLe · Extensibility · 語言模型化 · Attention · Processing（編程語言） ·

2022 年 10 月 25 日

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Xiang Yue,Huseyin A. Inan,Xuechen Li,Girish Kumar,Julia McAnallen,Huan Sun,David Levitan,Robert Sim

Privacy concerns have attracted increasing attention in data-driven products and services. Existing legislation forbids arbitrary processing of personal data collected from individuals. Generating synthetic versions of such data with a formal privacy guarantee such as differential privacy (DP) is considered to be a solution to address privacy concerns. In this direction, we show a simple, practical, and effective recipe in the text domain: simply fine-tuning a generative language model with DP allows us to generate useful synthetic text while mitigating privacy concerns. Through extensive empirical analyses, we demonstrate that our method produces synthetic data that is competitive in terms of utility with its non-private counterpart and meanwhile provides strong protection against potential privacy leakages.

語言模型化 · MoDELS · INFORMS · Analysis · 樣本 ·

2022 年 10 月 25 日

Differentially Private Language Models for Secure Data Sharing

Justus Mattern,Zhijing Jin,Benjamin Weggenmann,Bernhard Schoelkopf,Mrinmaya Sachan

from arxiv, Accepted at EMNLP 2022

To protect the privacy of individuals whose data is being shared, it is of high importance to develop methods allowing researchers and companies to release textual data while providing formal privacy guarantees to its originators. In the field of NLP, substantial efforts have been directed at building mechanisms following the framework of local differential privacy, thereby anonymizing individual text samples before releasing them. In practice, these approaches are often dissatisfying in terms of the quality of their output language due to the strong noise required for local differential privacy. In this paper, we approach the problem at hand using global differential privacy, particularly by training a generative language model in a differentially private manner and consequently sampling data from it. Using natural language prompts and a new prompt-mismatch loss, we are able to create highly accurate and fluent textual datasets taking on specific desired attributes such as sentiment or topic and resembling statistical properties of the training data. We perform thorough experiments indicating that our synthetic datasets do not leak information from our original data and are of high language quality and highly suitable for training models for further analysis on real-world data. Notably, we also demonstrate that training classifiers on private synthetic data outperforms directly training classifiers on real data with DP-SGD.

Better · MoDELS · Learning · 噪聲 · 最大平均偏差 ·

2022 年 10 月 24 日

Differentially Private Data Generation Needs Better Features

Fredrik Harder,Milad Jalali Asadabadi,Danica J. Sutherland,Mijung Park

Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large. We advocate instead building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation. In particular, we minimize the maximum mean discrepancy (MMD) between private target data and a generator's distribution, using a kernel based on perceptual features learned from a public dataset. With the MMD, we can simply privatize the data-dependent term once and for all, rather than introducing noise at each step of optimization as in DP-SGD. Our algorithm allows us to generate CIFAR10-level images with $\epsilon \approx 2$ which capture distinctive features in the distribution, far surpassing the current state of the art, which mostly focuses on datasets such as MNIST and FashionMNIST at a large $\epsilon \approx 10$. Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.

MoDELS · 語言模型化 · 泛函 · Extensibility · 詞元分析器 ·

2022 年 10 月 22 日

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Weiyan Shi,Ryan Shea,Si Chen,Chiyuan Zhang,Ruoxi Jia,Zhou Yu

from arxiv, EMNLP 2022

Protecting large language models from privacy leakage is becoming increasingly crucial with their wide adoption in real-world products. Yet applying differential privacy (DP), a canonical notion with provable privacy guarantees for machine learning models, to those models remains challenging due to the trade-off between model utility and privacy loss. Utilizing the fact that sensitive information in language data tends to be sparse, Shi et al. (2021) formalized a DP notion extension called Selective Differential Privacy (SDP) to protect only the sensitive tokens defined by a policy function. However, their algorithm only works for RNN-based models. In this paper, we develop a novel framework, Just Fine-tune Twice (JFT), that achieves SDP for state-of-the-art large transformer-based models. Our method is easy to implement: it first fine-tunes the model with redacted in-domain data, and then fine-tunes it again with the original in-domain data using a private training mechanism. Furthermore, we study the scenario of imperfect implementation of policy functions that misses sensitive tokens and develop systematic methods to handle it. Experiments show that our method achieves strong utility compared to previous baselines. We also analyze the SDP privacy guarantee empirically with the canary insertion attack.

查準率/準確率 · MoDELS · Learning · Performer · 模型評估 ·

2022 年 10 月 22 日

Mixed Precision Quantization to Tackle Gradient Leakage Attacks in Federated Learning

Pretom Roy Ovi,Emon Dey,Nirmalya Roy,Aryya Gangopadhyay

Federated Learning (FL) enables collaborative model building among a large number of participants without the need for explicit data sharing. But this approach shows vulnerabilities when privacy inference attacks are applied to it. In particular, in the event of a gradient leakage attack, which has a higher success rate in retrieving sensitive data from the model gradients, FL models are at higher risk due to the presence of communication in their inherent architecture. The most alarming thing about this gradient leakage attack is that it can be performed in such a covert way that it does not hamper the training performance while the attackers backtrack from the gradients to get information about the raw data. Two of the most common approaches proposed as solutions to this issue are homomorphic encryption and adding noise with differential privacy parameters. These two approaches suffer from two major drawbacks. They are: the key generation process becomes tedious with the increasing number of clients, and noise-based differential privacy suffers from a significant drop in global model accuracy. As a countermeasure, we propose a mixed-precision quantized FL scheme, and we empirically show that both of the issues addressed above can be resolved. In addition, our approach can ensure more robustness as different layers of the deep model are quantized with different precision and quantization modes. We empirically proved the validity of our method with three benchmark datasets and found a minimal accuracy drop in the global model after applying quantization.

TAP · 值域 · 可辨認的 · 相互獨立的 · state-of-the-art ·

2022 年 10 月 21 日

TAP: Transparent and Privacy-Preserving Data Services

Daniel Reijsbergen,Aung Maw,Zheng Yang,Tien Tuan Anh Dinh,Jianying Zhou

from arxiv, Accepted for USENIX Security 2023

Users today expect more security from services that handle their data. In addition to traditional data privacy and integrity requirements, they expect transparency, i.e., that the service's processing of the data is verifiable by users and trusted auditors. Our goal is to build a multi-user system that provides data privacy, integrity, and transparency for a large number of operations, while achieving practical performance. To this end, we first identify the limitations of existing approaches that use authenticated data structures. We find that they fall into two categories: 1) those that hide each user's data from other users, but have a limited range of verifiable operations (e.g., CONIKS, Merkle2, and Proofs of Liabilities), and 2) those that support a wide range of verifiable operations, but make all data publicly visible (e.g., IntegriDB and FalconDB). We then present TAP to address the above limitations. The key component of TAP is a novel tree data structure that supports efficient result verification, and relies on independent audits that use zero-knowledge range proofs to show that the tree is constructed correctly without revealing user data. TAP supports a broad range of verifiable operations, including quantiles and sample standard deviations. We conduct a comprehensive evaluation of TAP, and compare it against two state-of-the-art baselines, namely IntegriDB and Merkle2, showing that the system is practical at scale.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.