The era of Big Data has brought with it a richer understanding of user behavior through massive data sets, which can help organizations optimize the quality of their services. In the context of transportation research, mobility data can provide Municipal Authorities (MA) with insights on how to operate, regulate, or improve the transportation network. Mobility data, however, may contain sensitive information about end users and trade secrets of Mobility Providers (MP). Due to this data privacy concern, MPs may be reluctant to contribute their datasets to MA. Using ideas from cryptography, we propose an interactive protocol between a MA and a MP in which MA obtains insights from mobility data without MP having to reveal its trade secrets or sensitive data of its users. This is accomplished in two steps: a commitment step, and a computation step. In the first step, Merkle commitments and aggregated traffic measurements are used to generate a cryptographic commitment. In the second step, MP extracts insights from the data and sends them to MA. Using the commitment and zero-knowledge proofs, MA can certify that the information received from MP is accurate, without needing to directly inspect the mobility data. We also present a differentially private version of the protocol that is suitable for the large query regime. The protocol is verifiable for both MA and MP in the sense that dishonesty from one party can be detected by the other. The protocol can be readily extended to the more general setting with multiple MPs via secure multi-party computation.
The availability of genomic data is essential to progress in biomedical research, personalized medicine, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. As a result, several initiatives have been launched to experiment with synthetic genomic data, e.g., using generative models to learn the underlying distribution of the real data and generate artificial datasets that preserve its salient characteristics without exposing it. This paper provides the first evaluation of both utility and privacy protection of six state-of-the-art models for generating synthetic genomic data. We assess the performance of the synthetic data on several common tasks, such as allele population statistics and linkage disequilibrium. We then measure privacy through the lens of membership inference attacks, i.e., inferring whether a record was part of the training data. Our experiments show that no single approach to generate synthetic genomic data yields both high utility and strong privacy across the board. Also, the size and nature of the training dataset matter. Moreover, while some combinations of datasets and models produce synthetic data with distributions close to the real data, there often are target data points that are vulnerable to membership inference. Looking forward, our techniques can be used by practitioners to assess the risks of deploying synthetic genomic data in the wild and serve as a benchmark for future work.
With the advances in 5G and IoT devices, the industries are vastly adopting artificial intelligence (AI) techniques for improving classification and prediction-based services. However, the use of AI also raises concerns regarding privacy and security that can be misused or leaked. Private AI was recently coined to address the data security issue by combining AI with encryption techniques, but existing studies have shown that model inversion attacks can be used to reverse engineer the images from model parameters. In this regard, we propose a Federated Learning and Encryption-based Private (FLEP) AI framework that provides two-tier security for data and model parameters in an IIoT environment. We proposed a three-layer encryption method for data security and provide a hypothetical method to secure the model parameters. Experimental results show that the proposed method achieves better encryption quality at the expense of slightly increased execution time. We also highlight several open issues and challenges regarding the FLEP AI framework's realization.
This paper studies decentralized federated learning algorithms in wireless IoT networks. The traditional parameter server architecture for federated learning faces some problems such as low fault tolerance, large communication overhead and inaccessibility of private data. To solve these problems, we propose a Decentralized-Wireless-Federated-Learning algorithm called DWFL. The algorithm works in a system where the workers are organized in a peer-to-peer and server-less manner, and the workers exchange their privacy preserving data with the analog transmission scheme over wireless channels in parallel. With rigorous analysis, we show that DWFL satisfies $(\epsilon,\delta)$-differential privacy and the privacy budget per worker scales as $\mathcal{O}(\frac{1}{\sqrt{N}})$, in contrast with the constant budget in the orthogonal transmission approach. Furthermore, DWFL converges at the same rate of $\mathcal{O}(\sqrt{\frac{1}{TN}})$ as the best known centralized algorithm with a central parameter server. Extensive experiments demonstrate that our algorithm DWFL also performs well in real settings.
Probabilistic counters are well known tools often used for space-efficient set cardinality estimation. In this paper we investigate probabilistic counters from the perspective of preserving privacy. We use standard, rigid differential privacy notion. The intuition is that the probabilistic counters do not reveal too much information about individuals, but provide only general information about the population. Thus they can be used safely without violating privacy of individuals. It turned out however that providing a precise, formal analysis of privacy parameters of probabilistic counters is surprisingly difficult and needs advanced techniques and a very careful approach. We demonstrate also that probabilistic counters can be used as a privacy protecion mechanism without any extra randomization. That is, the inherit randomization from the protocol is sufficient for protecting privacy, even if the probabilistic counter is used many times. In particular we present a specific privacy-preserving data aggregation protocol based on a probabilistic counter. Our results can be used for example in performing distributed surveys.
Augmented reality technology is one of the leading technologies in the context of Industry 4.0. The promising potential application of augmented reality in industrial production systems has received much attention, which led to the concept of industrial augmented reality. On the one hand, this technology provides a suitable platform that facilitates the registration of information and access to them to help make decisions and allows concurrent training for the user while executing the production processes. This leads to increased work speed and accuracy of the user as a process operator and consequently offers economic benefits to the companies. Moreover, recent advances in the internet of things, smart sensors, and advanced algorithms have increased the possibility of widespread and more effective use of augmented reality. Currently, many research pieces are being done to expand the application of augmented reality and increase its effectiveness in industrial production processes. This research demonstrates the influence of augmented reality in Industry 4.0 while critically reviewing the industrial augmented reality history. Afterward, the paper discusses the critical role of industrial augmented reality by analyzing some use cases and their prospects. With a systematic analysis, this paper discusses the main future directions for industrial augmented reality applications in industry 4.0. The article investigates various areas of application for this technology and its impact on improving production conditions. Finally, the challenges that this technology faces and its research opportunities are discussed.
As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models is facing efficiency and privacy challenges. Recently, federated learning (FL) has emerged as an alternative solution and continue to thrive in this new reality. Existing FL protocol design has been shown to be vulnerable to adversaries within or outside of the system, compromising data privacy and system robustness. Besides training powerful global models, it is of paramount importance to design FL systems that have privacy guarantees and are resistant to different types of adversaries. In this paper, we conduct the first comprehensive survey on this topic. Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic. We highlight the intuitions, key techniques as well as fundamental assumptions adopted by various attacks and defenses. Finally, we discuss promising future research directions towards robust and privacy-preserving federated learning.
Federated learning is a distributed machine learning method that aims to preserve the privacy of sample features and labels. In a federated learning system, ID-based sample alignment approaches are usually applied with few efforts made on the protection of ID privacy. In real-life applications, however, the confidentiality of sample IDs, which are the strongest row identifiers, is also drawing much attention from many participants. To relax their privacy concerns about ID privacy, this paper formally proposes the notion of asymmetrical vertical federated learning and illustrates the way to protect sample IDs. The standard private set intersection protocol is adapted to achieve the asymmetrical ID alignment phase in an asymmetrical vertical federated learning system. Correspondingly, a Pohlig-Hellman realization of the adapted protocol is provided. This paper also presents a genuine with dummy approach to achieving asymmetrical federated model training. To illustrate its application, a federated logistic regression algorithm is provided as an example. Experiments are also made for validating the feasibility of this approach.
Federated learning has been showing as a promising approach in paving the last mile of artificial intelligence, due to its great potential of solving the data isolation problem in large scale machine learning. Particularly, with consideration of the heterogeneity in practical edge computing systems, asynchronous edge-cloud collaboration based federated learning can further improve the learning efficiency by significantly reducing the straggler effect. Despite no raw data sharing, the open architecture and extensive collaborations of asynchronous federated learning (AFL) still give some malicious participants great opportunities to infer other parties' training data, thus leading to serious concerns of privacy. To achieve a rigorous privacy guarantee with high utility, we investigate to secure asynchronous edge-cloud collaborative federated learning with differential privacy, focusing on the impacts of differential privacy on model convergence of AFL. Formally, we give the first analysis on the model convergence of AFL under DP and propose a multi-stage adjustable private algorithm (MAPA) to improve the trade-off between model utility and privacy by dynamically adjusting both the noise scale and the learning rate. Through extensive simulations and real-world experiments with an edge-could testbed, we demonstrate that MAPA significantly improves both the model accuracy and convergence speed with sufficient privacy guarantee.
Machine Learning is a widely-used method for prediction generation. These predictions are more accurate when the model is trained on a larger dataset. On the other hand, the data is usually divided amongst different entities. For privacy reasons, the training can be done locally and then the model can be safely aggregated amongst the participants. However, if there are only two participants in \textit{Collaborative Learning}, the safe aggregation loses its power since the output of the training already contains much information about the participants. To resolve this issue, they must employ privacy-preserving mechanisms, which inevitably affect the accuracy of the model. In this paper, we model the training process as a two-player game where each player aims to achieve a higher accuracy while preserving its privacy. We introduce the notion of \textit{Price of Privacy}, a novel approach to measure the effect of privacy protection on the accuracy of the model. We develop a theoretical model for different player types, and we either find or prove the existence of a Nash Equilibrium with some assumptions. Moreover, we confirm these assumptions via a Recommendation Systems use case: for a specific learning algorithm, we apply three privacy-preserving mechanisms on two real-world datasets. Finally, as a complementary work for the designed game, we interpolate the relationship between privacy and accuracy for this use case and present three other methods to approximate it in a real-world scenario.
Recommender systems rely on large datasets of historical data and entail serious privacy risks. A server offering recommendations as a service to a client might leak more information than necessary regarding its recommendation model and training dataset. At the same time, the disclosure of the client's preferences to the server is also a matter of concern. Providing recommendations while preserving privacy in both senses is a difficult task, which often comes into conflict with the utility of the system in terms of its recommendation-accuracy and efficiency. Widely-purposed cryptographic primitives such as secure multi-party computation and homomorphic encryption offer strong security guarantees, but in conjunction with state-of-the-art recommender systems yield far-from-practical solutions. We precisely define the above notion of security and propose CryptoRec, a novel recommendations-as-a-service protocol, which encompasses a crypto-friendly recommender system. This model possesses two interesting properties: (1) It models user-item interactions in a user-free latent feature space in which it captures personalized user features by an aggregation of item features. This means that a server with a pre-trained model can provide recommendations for a client without having to re-train the model with the client's preferences. Nevertheless, re-training the model still improves accuracy. (2) It only uses addition and multiplication operations, making the model straightforwardly compatible with homomorphic encryption schemes.