人人操人人莫人人草-国产精品美乳免费看

Large language models (LLMs) have had a profound impact on numerous aspects of daily life including natural language processing, content generation, research methodologies and so on. However, one crucial issue concerning the inference results of large language models is security and privacy. In many scenarios, the results generated by LLMs could possibly leak many confidential or copyright information. A recent beautiful and breakthrough work [Vyas, Kakade and Barak 2023] focus on such privacy issue of the LLMs from theoretical perspective. It is well-known that computing the attention matrix is one of the major task during the LLMs computation. Thus, how to give a provable privately guarantees of computing the attention matrix is an important research direction. Previous work [Alman and Song 2023, Brand, Song and Zhou 2023] have proposed provable tight result for fast computation of attention without considering privacy concerns. One natural mathematical formulation to quantity the privacy in theoretical computer science graduate school textbook is differential privacy. Inspired by [Vyas, Kakade and Barak 2023], in this work, we provide a provable result for showing how to differentially private approximate the attention matrix. From technique perspective, our result replies on a pioneering work in the area of differential privacy by [Alabi, Kothari, Tankala, Venkat and Zhang 2022].

相關內容

Attention

關注 1

區塊鏈 · MoDELS · Performer · 統計量 · Integration ·

2023 年 6 月 21 日

Differentially Private Enhanced Permissioned Blockchain for Private Data Sharing in Industrial IoT

Muhammad Islam,Mubashir Husain Rehmani,Jinjun Chen

The integration of permissioned blockchain such as Hyperledger fabric (HF) and Industrial internet of Things (IIoT) has opened new opportunities for interdependent supply chain partners to improve their performance through data sharing and coordination. The multichannel mechanism, private data collection and querying mechanism of HF enable private data sharing, transparency, traceability, and verification across the supply chain. However, the existing querying mechanism of HF needs further improvement for statistical data sharing because the query is evaluated on the original data recorded on the ledger. As a result, it gives rise to privacy issues such as leak of business secrets, tracking of resources and assets, and disclose of personal information. Therefore, we solve this problem by proposing a differentially private enhanced permissioned blockchain for private data sharing in the context of supply chain in IIoT which is known as (EDH-IIoT). We propose algorithms to efficiently utilize the $\epsilon$ through the reuse of the privacy budget for the repeated queries. Furthermore, the reuse and tracking of $\epsilon$ enable the data owner to get ensure that $\epsilon$ does not exceed the threshold which is the maximum privacy budget ($\epsilon_{t}$). Finally, we model two privacy attacks namely linking attack and composition attack to evaluate and compare privacy preservation, and the efficiency of reuse of {\epsilon} with the default chaincode of HF and traditional differential privacy model, respectively. The results confirm that EDH-IIoT obtains an accuracy of 97% in the shared data for $\epsilon_{t}$ = 1, and a reduction of 35.96% in spending of $\epsilon$.

相互獨立的 · 離散化 · FAST · 縮放 · 講稿 ·

2023 年 6 月 20 日

Fast quantum algorithm for differential equations

Mohsen Bagherimehrab,Kouhei Nakaji,Nathan Wiebe,Alán Aspuru-Guzik

from arxiv, 5 pages main text, 14 pages in total including appendix, 2 figures

Partial differential equations (PDEs) are ubiquitous in science and engineering. Prior quantum algorithms for solving the system of linear algebraic equations obtained from discretizing a PDE have a computational complexity that scales at least linearly with the condition number $\kappa$ of the matrices involved in the computation. For many practical applications, $\kappa$ scales polynomially with the size $N$ of the matrices, rendering a polynomial-in-$N$ complexity for these algorithms. Here we present a quantum algorithm with a complexity that is polylogarithmic in $N$ but is independent of $\kappa$ for a large class of PDEs. Our algorithm generates a quantum state that enables extracting features of the solution. Central to our methodology is using a wavelet basis as an auxiliary system of coordinates in which the condition number of associated matrices is independent of $N$ by a simple diagonal preconditioner. We present numerical simulations showing the effect of the wavelet preconditioner for several differential equations. Our work could provide a practical way to boost the performance of quantum-simulation algorithms where standard methods are used for discretization.

Medical Image Analysis · Analysis · 等變 · Learning · 模型評估 ·

2023 年 6 月 20 日

Bridging the Gap: Differentially Private Equivariant Deep Learning for Medical Image Analysis

Florian A. H?lzl,Daniel Rueckert,Georgios Kaissis

from arxiv, Accepted as extended abstract at GeoMedIA Workshop 2022 (//openreview.net/forum?id=rGYfMrMxI17)

Machine learning with formal privacy-preserving techniques like Differential Privacy (DP) allows one to derive valuable insights from sensitive medical imaging data while promising to protect patient privacy, but it usually comes at a sharp privacy-utility trade-off. In this work, we propose to use steerable equivariant convolutional networks for medical image analysis with DP. Their improved feature quality and parameter efficiency yield remarkable accuracy gains, narrowing the privacy-utility gap.

特征空間 · 向量化 · Unstructured · INFORMS · 特征向量 ·

2023 年 6 月 20 日

DP-Image: Differential Privacy for Image Data in Feature Space

Hanyu Xue,Bo Liu,Ming Ding,Tianqing Zhu,Dayong Ye,Li Song,Wanlei Zhou

The excessive use of images in social networks, government databases, and industrial applications has posed great privacy risks and raised serious concerns from the public. Even though differential privacy (DP) is a widely accepted criterion that can provide a provable privacy guarantee, the application of DP on unstructured data such as images is not trivial due to the lack of a clear qualification on the meaningful difference between any two images. In this paper, for the first time, we introduce a novel notion of image-aware differential privacy, referred to as DP-image, that can protect user's personal information in images, from both human and AI adversaries. The DP-Image definition is formulated as an extended version of traditional differential privacy, considering the distance measurements between feature space vectors of images. Then we propose a mechanism to achieve DP-Image by adding noise to an image feature vector. Finally, we conduct experiments with a case study on face image privacy. Our results show that the proposed DP-Image method provides excellent DP protection on images, with a controllable distortion to faces.

有偏 · 得分 · 數據集 · 簇 · 樣本 ·

2023 年 6 月 19 日

DiPPS: Differentially Private Propensity Scores for Bias Correction

Liangwei Chen,Valentin Hartmann,Robert West

from arxiv, 11 pages, 2 figures. Current version: conference version

In surveys, it is typically up to the individuals to decide if they want to participate or not, which leads to participation bias: the individuals willing to share their data might not be representative of the entire population. Similarly, there are cases where one does not have direct access to any data of the target population and has to resort to publicly available proxy data sampled from a different distribution. In this paper, we present Differentially Private Propensity Scores for Bias Correction (DiPPS), a method for approximating the true data distribution of interest in both of the above settings. We assume that the data analyst has access to a dataset $\tilde{D}$ that was sampled from the distribution of interest in a biased way. As individuals may be more willing to share their data when given a privacy guarantee, we further assume that the analyst is allowed locally differentially private access to a set of samples $D$ from the true, unbiased distribution. Each data point from the private, unbiased dataset $D$ is mapped to a probability distribution over clusters (learned from the biased dataset $\tilde{D}$), from which a single cluster is sampled via the exponential mechanism and shared with the data analyst. This way, the analyst gathers a distribution over clusters, which they use to compute propensity scores for the points in the biased $\tilde{D}$, which are in turn used to reweight the points in $\tilde{D}$ to approximate the true data distribution. It is now possible to compute any function on the resulting reweighted dataset without further access to the private $D$. In experiments on datasets from various domains, we show that DiPPS successfully brings the distribution of the available dataset closer to the distribution of interest in terms of Wasserstein distance. We further show that this results in improved estimates for different statistics.

MIMO · MoDELS · Learning · 服務器 · 噪聲 ·

2023 年 6 月 19 日

Differentially Private Over-the-Air Federated Learning Over MIMO Fading Channels

Hang Liu,Jia Yan,Ying-Jun Angela Zhang

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, uploading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data privacy in FL. Previous research has focused on over-the-air FL with a single-antenna server, leveraging communication noise to enhance user-level DP. This approach achieves the so-called "free DP" by controlling transmit power rather than introducing additional DP-preserving mechanisms at devices, such as adding artificial noise. In this paper, we study differentially private over-the-air FL over a multiple-input multiple-output (MIMO) fading channel. We show that FL model communication with a multiple-antenna server amplifies privacy leakage as the multiple-antenna server employs separate receive combining for model aggregation and information inference. Consequently, relying solely on communication noise, as done in the multiple-input single-output system, cannot meet high privacy requirements, and a device-side privacy-preserving mechanism is necessary for optimal DP design. We analyze the learning convergence and privacy loss of the studied FL system and propose a transceiver design algorithm based on alternating optimization. Numerical results demonstrate that the proposed method achieves a better privacy-learning trade-off compared to prior work.

圖片分類 · 模型評估 · Networking · 可約的 · Neural Networks ·

2023 年 6 月 19 日

Pre-Pruning and Gradient-Dropping Improve Differentially Private Image Classification

Kamil Adamczewski,Yingchen He,Mijung Park

from arxiv, arXiv admin note: text overlap with arXiv:2303.04612

Scalability is a significant challenge when it comes to applying differential privacy to training deep neural networks. The commonly used DP-SGD algorithm struggles to maintain a high level of privacy protection while achieving high accuracy on even moderately sized models. To tackle this challenge, we take advantage of the fact that neural networks are overparameterized, which allows us to improve neural network training with differential privacy. Specifically, we introduce a new training paradigm that uses \textit{pre-pruning} and \textit{gradient-dropping} to reduce the parameter space and improve scalability. The process starts with pre-pruning the parameters of the original network to obtain a smaller model that is then trained with DP-SGD. During training, less important gradients are dropped, and only selected gradients are updated. Our training paradigm introduces a tension between the rates of pre-pruning and gradient-dropping, privacy loss, and classification accuracy. Too much pre-pruning and gradient-dropping reduces the model's capacity and worsens accuracy, while training a smaller model requires less privacy budget for achieving good accuracy. We evaluate the interplay between these factors and demonstrate the effectiveness of our training paradigm for both training from scratch and fine-tuning pre-trained networks on several benchmark image classification datasets. The tools can also be readily incorporated into existing training paradigms.

前向 · Extensibility · 可約的 · Continuity · VR ·

2023 年 6 月 18 日

Metaverse: Security and Privacy Concerns

Ruoyu Zhao,Yushu Zhang,Youwen Zhu,Rushi Lan,Zhongyun Hua

from arxiv, This work has been accepted by Journal of Metaverse

The term "metaverse", a three-dimensional virtual universe similar to the real realm, has always been full of imagination since it was put forward in the 1990s. Recently, it is possible to realize the metaverse with the continuous emergence and progress of various technologies, and thus it has attracted extensive attention again. It may bring a lot of benefits to human society such as reducing discrimination, eliminating individual differences, and socializing. However, everything has security and privacy concerns, which is no exception for the metaverse. In this article, we firstly analyze the concept of the metaverse and propose that it is a super virtual-reality (VR) ecosystem compared with other VR technologies. Then, we carefully analyze and elaborate on possible security and privacy concerns from four perspectives: user information, communication, scenario, and goods, and immediately, the potential solutions are correspondingly put forward. Meanwhile, we propose the need to take advantage of the new buckets effect to comprehensively address security and privacy concerns from a philosophical perspective, which hopefully will bring some progress to the metaverse community.

binary · Continuity · 噪聲 · 可約的 · 噪聲分布 ·

2023 年 6 月 16 日

A Smooth Binary Mechanism for Efficient Private Continual Observation

Joel Daniel Andersson,Rasmus Pagh

In privacy under continual observation we study how to release differentially private estimates based on a dataset that evolves over time. The problem of releasing private prefix sums of $x_1,x_2,x_3,\dots \in\{0,1\}$ (where the value of each $x_i$ is to be private) is particularly well-studied, and a generalized form is used in state-of-the-art methods for private stochastic gradient descent (SGD). The seminal binary mechanism privately releases the first $t$ prefix sums with noise of variance polylogarithmic in $t$. Recently, Henzinger et al. and Denisov et al. showed that it is possible to improve on the binary mechanism in two ways: The variance of the noise can be reduced by a (large) constant factor, and also made more even across time steps. However, their algorithms for generating the noise distribution are not as efficient as one would like in terms of computation time and (in particular) space. We address the efficiency problem by presenting a simple alternative to the binary mechanism in which 1) generating the noise takes constant average time per value, 2) the variance is reduced by a factor about 4 compared to the binary mechanism, and 3) the noise distribution at each step is identical. Empirically, a simple Python implementation of our approach outperforms the running time of the approach of Henzinger et al., as well as an attempt to improve their algorithm using high-performance algorithms for multiplication with Toeplitz matrices.

估計/估計量 · 情景 · 示例 · 次最優 · Extensibility ·

2023 年 6 月 15 日

Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

Jingfeng Wu,Wennan Zhu,Peter Kairouz,Vladimir Braverman

In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketching is nearly information-theoretically optimal for achieving the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However, we show that under the more practical multi-round FEE setting, simple adaptations of count sketching are strictly sub-optimal, and we propose a novel hybrid sketching algorithm that is provably more accurate. We also address the following fundamental question: how should a practitioner set the sketch size in a way that adapts to the hardness of the underlying problem? We propose a two-phase approach that allows for the use of a smaller sketch size for simpler problems (e.g. near-sparse or light-tailed distributions). We conclude our work by showing how differential privacy can be added to our algorithm and verifying its superior performance through extensive experiments conducted on large-scale datasets.