The rapid expansion of Internet of Things (IoT) devices in smart homes has significantly improved the quality of life, offering enhanced convenience, automation, and energy efficiency. However, this proliferation of connected devices raises critical concerns regarding security and privacy of the user data. In this paper, we propose a differential privacy-based system to ensure comprehensive security for data generated by smart homes. We employ the randomized response technique for the data and utilize Local Differential Privacy (LDP) to achieve data privacy. The data is then transmitted to an aggregator, where an obfuscation method is applied to ensure individual anonymity. Furthermore, we implement the Hidden Markov Model (HMM) technique at the aggregator level and apply differential privacy to the private data received from smart homes. Consequently, our approach achieves a dual layer of privacy protection, addressing the security concerns associated with IoT devices in smart cities.
Virtual and augmented reality are currently enjoying a great deal of attention from the research community and the industry towards their adoption within industrial spaces and processes. However, the current design and implementation landscape is still very fluid, while the community as a whole has not yet consolidated into concrete design directions, other than basic patterns. Other open issues include the choice over a cloud or edge-based architecture when designing such systems. Within this work, we present our approach for a monitoring intervention inside a factory space utilizing both VR and AR, based primarily on edge computing, while also utilizing the cloud. We discuss its main design directions, as well as a basic ontology to aid in simple description of factory assets. In order to highlight the design aspects of our approach, we present a prototype implementation, based on a use case scenario in a factory site, within the context of the ENERMAN H2020 project.
This paper considers subject level privacy in the FL setting, where a subject is an individual whose private information is embodied by several data items either confined within a single federation user or distributed across multiple federation users. We propose two new algorithms that enforce subject level DP at each federation user locally. Our first algorithm, called LocalGroupDP, is a straightforward application of group differential privacy in the popular DP-SGD algorithm. Our second algorithm is based on a novel idea of hierarchical gradient averaging (HiGradAvgDP) for subjects participating in a training mini-batch. We also show that user level Local Differential Privacy (LDP) naturally guarantees subject level DP. We observe the problem of horizontal composition of subject level privacy loss in FL - subject level privacy loss incurred at individual users composes across the federation. We formally prove the subject level DP guarantee for our algorithms, and also show their effect on model utility loss. Our empirical evaluation on FEMNIST and Shakespeare datasets shows that LocalGroupDP delivers the best performance among our algorithms. However, its model utility lags behind that of models trained using a DP-SGD based algorithm that provides a weaker item level privacy guarantee. Privacy loss amplification due to subject sampling fractions and horizontal composition remain key challenges for model utility.
With the emergence of privacy leaks in federated learning, secure aggregation protocols that mainly adopt either homomorphic encryption or threshold secret sharing have been widely developed for federated learning to protect the privacy of the local training data of each client. However, these existing protocols suffer from many shortcomings, such as the dependence on a trusted third party, the vulnerability to clients being corrupted, low efficiency, the trade-off between security and fault tolerance, etc. To solve these disadvantages, we propose an efficient and multi-private key secure aggregation scheme for federated learning. Specifically, we skillfully modify the variant ElGamal encryption technique to achieve homomorphic addition operation, which has two important advantages: 1) The server and each client can freely select public and private keys without introducing a trust third party and 2) Compared to the variant ElGamal encryption, the plaintext space is relatively large, which is more suitable for the deep model. Besides, for the high dimensional deep model parameter, we introduce a super-increasing sequence to compress multi-dimensional data into 1-D, which can greatly reduce encryption and decryption times as well as communication for ciphertext transmission. Detailed security analyses show that our proposed scheme achieves the semantic security of both individual local gradients and the aggregated result while achieving optimal robustness in tolerating both client collusion and dropped clients. Extensive simulations demonstrate that the accuracy of our scheme is almost the same as the non-private approach, while the efficiency of our scheme is much better than the state-of-the-art homomorphic encryption-based secure aggregation schemes. More importantly, the efficiency advantages of our scheme will become increasingly prominent as the number of model parameters increases.
This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.
Federated learning (FL) as distributed machine learning has gained popularity as privacy-aware Machine Learning (ML) systems have emerged as a technique that prevents privacy leakage by building a global model and by conducting individualized training of decentralized edge clients on their own private data. The existing works, however, employ privacy mechanisms such as Secure Multiparty Computing (SMC), Differential Privacy (DP), etc. Which are immensely susceptible to interference, massive computational overhead, low accuracy, etc. With the increasingly broad deployment of FL systems, it is challenging to ensure fairness and maintain active client participation in FL systems. Very few works ensure reasonably satisfactory performances for the numerous diverse clients and fail to prevent potential bias against particular demographics in FL systems. The current efforts fail to strike a compromise between privacy, fairness, and model performance in FL systems and are vulnerable to a number of additional problems. In this paper, we provide a comprehensive survey stating the basic concepts of FL, the existing privacy challenges, techniques, and relevant works concerning privacy in FL. We also provide an extensive overview of the increasing fairness challenges, existing fairness notions, and the limited works that attempt both privacy and fairness in FL. By comprehensively describing the existing FL systems, we present the potential future directions pertaining to the challenges of privacy-preserving and fairness-aware FL systems.
A large amount of data and applications are migrated by researchers, stakeholders, academia, and business organizations to the cloud environment due to its large variety of services, which involve the least maintenance cost, maximum flexibility, and on-demand service for storage, computation, and data distribution intentions. Despite the various characteristics the cloud environment supports, it also faces many challenges. However, data users may not completely trust a cloud environment that is engaged by a third party. Every cloud user always has a prime concern, i.e., security. Numerous methods have been designed to solve the issue of data security during data storage, calculation, and sharing across stakeholders and users. Nevertheless, there is a lack of existing methods that tackle the issue of the security of data when it is stored in a cloud environment. This article presents a precise security method that has handled the security of data while it is being shared and stored in the cloud. These methods have been utilized to lessen security assaults and prevent unauthorized parties from accessing the actual data. The article is concluded with some limitations and recommendations for the future in terms of secure data retention and distribution.
Large Language Models (LLMs) are gaining increasing attention due to their exceptional performance across numerous tasks. As a result, the general public utilize them as an influential tool for boosting their productivity while natural language processing researchers endeavor to employ them in solving existing or new research problems. Unfortunately, individuals can only access such powerful AIs through APIs, which ultimately leads to the transmission of raw data to the models' providers and increases the possibility of privacy data leakage. Current privacy-preserving methods for cloud-deployed language models aim to protect privacy information in the pre-training dataset or during the model training phase. However, they do not meet the specific challenges presented by the remote access approach of new large-scale language models. This paper introduces a novel task, "User Privacy Protection for Dialogue Models," which aims to safeguard sensitive user information from any possible disclosure while conversing with chatbots. We also present an evaluation scheme for this task, which covers evaluation metrics for privacy protection, data availability, and resistance to simulation attacks. Moreover, we propose the first framework for this task, namely privacy protection through text sanitization. Before sending the input to remote large models, it filters out the sensitive information, using several rounds of text sanitization based on privacy types that users define. Upon receiving responses from the larger model, our framework automatically restores privacy to ensure that the conversation goes smoothly, without intervention from the privacy filter. Experiments based on real-world datasets demonstrate the efficacy of our privacy-preserving approach against eavesdropping from potential attackers.
With the recent hype around the Metaverse and NFTs, Web3 is getting more and more popular. The goal of Web3 is to decentralize the web via decentralized applications. Wallets play a crucial role as they act as an interface between these applications and the user. Wallets such as MetaMask are being used by millions of users nowadays. Unfortunately, Web3 is often advertised as more secure and private. However, decentralized applications as well as wallets are based on traditional technologies, which are not designed with privacy of users in mind. In this paper, we analyze the privacy implications that Web3 technologies such as decentralized applications and wallets have on users. To this end, we build a framework that measures exposure of wallet information. First, we study whether information about installed wallets is being used to track users online. We analyze the top 100K websites and find evidence of 1,325 websites running scripts that probe whether users have wallets installed in their browser. Second, we measure whether decentralized applications and wallets leak the user's unique wallet address to third-parties. We intercept the traffic of 616 decentralized applications and 100 wallets and find over 2000 leaks across 211 applications and more than 300 leaks across 13 wallets. Our study shows that Web3 poses a threat to users' privacy and requires new designs towards more privacy-aware wallet architectures.
Federated learning (FL) naturally faces the problem of data heterogeneity in real-world scenarios, but this is often overlooked by studies on FL security and privacy. On the one hand, the effectiveness of backdoor attacks on FL may drop significantly under non-IID scenarios. On the other hand, malicious clients may steal private data through privacy inference attacks. Therefore, it is necessary to have a comprehensive perspective of data heterogeneity, backdoor, and privacy inference. In this paper, we propose a novel privacy inference-empowered stealthy backdoor attack (PI-SBA) scheme for FL under non-IID scenarios. Firstly, a diverse data reconstruction mechanism based on generative adversarial networks (GANs) is proposed to produce a supplementary dataset, which can improve the attacker's local data distribution and support more sophisticated strategies for backdoor attacks. Based on this, we design a source-specified backdoor learning (SSBL) strategy as a demonstration, allowing the adversary to arbitrarily specify which classes are susceptible to the backdoor trigger. Since the PI-SBA has an independent poisoned data synthesis process, it can be integrated into existing backdoor attacks to improve their effectiveness and stealthiness in non-IID scenarios. Extensive experiments based on MNIST, CIFAR10 and Youtube Aligned Face datasets demonstrate that the proposed PI-SBA scheme is effective in non-IID FL and stealthy against state-of-the-art defense methods.
Fast developing artificial intelligence (AI) technology has enabled various applied systems deployed in the real world, impacting people's everyday lives. However, many current AI systems were found vulnerable to imperceptible attacks, biased against underrepresented groups, lacking in user privacy protection, etc., which not only degrades user experience but erodes the society's trust in all AI systems. In this review, we strive to provide AI practitioners a comprehensive guide towards building trustworthy AI systems. We first introduce the theoretical framework of important aspects of AI trustworthiness, including robustness, generalization, explainability, transparency, reproducibility, fairness, privacy preservation, alignment with human values, and accountability. We then survey leading approaches in these aspects in the industry. To unify the current fragmented approaches towards trustworthy AI, we propose a systematic approach that considers the entire lifecycle of AI systems, ranging from data acquisition to model development, to development and deployment, finally to continuous monitoring and governance. In this framework, we offer concrete action items to practitioners and societal stakeholders (e.g., researchers and regulators) to improve AI trustworthiness. Finally, we identify key opportunities and challenges in the future development of trustworthy AI systems, where we identify the need for paradigm shift towards comprehensive trustworthy AI systems.