This paper deals with the hot, evergreen topic of the relationship between privacy and technology. We give extensive motivation for why the privacy debate is still alive for private citizens and institutions, and we investigate the privacy concept. This paper proposes a novel vision of the privacy ecosystem, introducing privacy dimensions, the related users' expectations, the privacy violations, and the changing factors. We provide a critical assessment of the Privacy by Design paradigm, strategies, tactics, patterns, and Privacy-Enhancing Technologies, highlighting the current open issues. We believe that promising approaches to tackle the privacy challenges move in two directions: (i) identification of effective privacy metrics; and (ii) adoption of formal tools to design privacy-compliant applications.
In recent years, increasing deployment of face recognition technology in security-critical settings, such as border control or law enforcement, has led to considerable interest in the vulnerability of face recognition systems to attacks utilising legitimate documents, which are issued on the basis of digitally manipulated face images. As automated manipulation and attack detection remains a challenging task, conventional processes with human inspectors performing identity verification remain indispensable. These circumstances merit a closer investigation of human capabilities in detecting manipulated face images, as previous work in this field is sparse and often concentrated only on specific scenarios and biometric characteristics. This work introduces a web-based, remote visual discrimination experiment on the basis of principles adopted from the field of psychophysics and subsequently discusses interdisciplinary opportunities with the aim of examining human proficiency in detecting different types of digitally manipulated face images, specifically face swapping, morphing, and retouching. In addition to analysing appropriate performance measures, a possible metric of detectability is explored. Experimental data of 306 probands indicate that detection performance is widely distributed across the population and detection of certain types of face image manipulations is much more challenging than others.
Controlling the spread of infectious diseases, such as the ongoing SARS-CoV-2 pandemic, is one of the most challenging problems for human civilization. The world is more populous and connected than ever before, and therefore, the rate of contagion for such diseases often becomes stupendous. The development and distribution of testing kits cannot keep up with the demand, making it impossible to test everyone. The next best option is to identify and isolate the people who come in close contact with an infected person. However, this apparently simple process, commonly known as - contact tracing, suffers from two major pitfalls: the requirement of a large amount of manpower to track the infected individuals manually and the breach in privacy and security while automating the process. Here, we propose a Bluetooth based contact tracing hardware with anonymous IDs to solve both the drawbacks of the existing approaches. The hardware will be a wearable device that every user can carry conveniently. This device will measure the distance between two users and exchange the IDs anonymously in the case of a close encounter. The anonymous IDs stored in the device of any newly infected individual will be used to trace the risky contacts and the status of the IDs will be updated consequently by authorized personnel. To demonstrate the concept, we simulate the working procedure and highlight the effectiveness of our technique to curb the spread of any contagious disease.
Machine Learning on Big Data gets more and more attention in various fields. Even so privacy-preserving techniques become more important, even necessary due to legal regulations such as the General Data Protection Regulation (GDPR). On the other hand data is often distributed among various parties. Especially in the medical context there are several data holders, e.g. hospitals and we need to deal with highly sensitive values. A real world scenario would be data that is held in an electronic patient record that is available in many countries by now. The medical data is encrypted. Users (e.g. physicians, hospitals) can only decrypt the data after patient authorization. One of the main questions concerning this scenario is whether it is possible to process the data for research purposes without violating the privacy of the data owner. We want to evaluate which cryptographic mechanism - homomorphic encryption, multiparty computation or trusted execution environements - can be used for this task.
Reviewers in peer review are often miscalibrated: they may be strict, lenient, extreme, moderate, etc. A number of algorithms have previously been proposed to calibrate reviews. Such attempts of calibration can however leak sensitive information about which reviewer reviewed which paper. In this paper, we identify this problem of calibration with privacy, and provide a foundational building block to address it. Specifically, we present a theoretical study of this problem under a simplified-yet-challenging model involving two reviewers, two papers, and an MAP-computing adversary. Our main results establish the Pareto frontier of the tradeoff between privacy (preventing the adversary from inferring reviewer identity) and utility (accepting better papers), and design explicit computationally-efficient algorithms that we prove are Pareto optimal.
Wireless energy sharing is a novel convenient alternative to charge IoT devices. In this demo paper, we present a peer-to-peer wireless energy sharing platform. The platform enables users to exchange energy wirelessly with nearby IoT devices. The energy sharing platform allows IoT users to send and receive energy wirelessly. The platform consists of (i) a mobile application that monitors and synchronizes the energy transfer among two IoT devices and (ii) and a backend to register energy providers and consumers and store their energy transfer transactions. The eveloped framework allows the collection of a real wireless energy sharing dataset. A set of preliminary experiments has been conducted on the collected dataset to analyze and demonstrate the behavior of the current wireless energy sharing technology.
Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific discoveries in recent years. However, the textual data alone are often not enough to conduct studies: especially social scientists need more variables to perform their analysis and control for various factors. How we augment this information, such as users' location, age, or tweet sentiment, has ramifications for anonymity and reproducibility, and requires dedicated effort. This paper describes Twitter-Demographer, a simple, flow-based tool to enrich Twitter data with additional information about tweets and users. Twitter-Demographer is aimed at NLP practitioners and (computational) social scientists who want to enrich their datasets with aggregated information, facilitating reproducibility, and providing algorithmic privacy-by-design measures for pseudo-anonymity. We discuss our design choices, inspired by the flow-based programming paradigm, to use black-box components that can easily be chained together and extended. We also analyze the ethical issues related to the use of this tool, and the built-in measures to facilitate pseudo-anonymity.
Bayesian models based on the Dirichlet process and other stick-breaking priors have been proposed as core ingredients for clustering, topic modeling, and other unsupervised learning tasks. However, due to the flexibility of these models, the consequences of prior choices can be opaque. And so prior specification can be relatively difficult. At the same time, prior choice can have a substantial effect on posterior inferences. Thus, considerations of robustness need to go hand in hand with nonparametric modeling. In the current paper, we tackle this challenge by exploiting the fact that variational Bayesian methods, in addition to having computational advantages in fitting complex nonparametric models, also yield sensitivities with respect to parametric and nonparametric aspects of Bayesian models. In particular, we demonstrate how to assess the sensitivity of conclusions to the choice of concentration parameter and stick-breaking distribution for inferences under Dirichlet process mixtures and related mixture models. We provide both theoretical and empirical support for our variational approach to Bayesian sensitivity analysis.
Fast developing artificial intelligence (AI) technology has enabled various applied systems deployed in the real world, impacting people's everyday lives. However, many current AI systems were found vulnerable to imperceptible attacks, biased against underrepresented groups, lacking in user privacy protection, etc., which not only degrades user experience but erodes the society's trust in all AI systems. In this review, we strive to provide AI practitioners a comprehensive guide towards building trustworthy AI systems. We first introduce the theoretical framework of important aspects of AI trustworthiness, including robustness, generalization, explainability, transparency, reproducibility, fairness, privacy preservation, alignment with human values, and accountability. We then survey leading approaches in these aspects in the industry. To unify the current fragmented approaches towards trustworthy AI, we propose a systematic approach that considers the entire lifecycle of AI systems, ranging from data acquisition to model development, to development and deployment, finally to continuous monitoring and governance. In this framework, we offer concrete action items to practitioners and societal stakeholders (e.g., researchers and regulators) to improve AI trustworthiness. Finally, we identify key opportunities and challenges in the future development of trustworthy AI systems, where we identify the need for paradigm shift towards comprehensive trustworthy AI systems.
We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough analysis of the available explicit user judgments---five-star ratings of translation quality---and show that they are not reliable enough to yield significant improvements in bandit learning. In contrast, we successfully utilize implicit task-based feedback collected in a cross-lingual search task to improve task-specific and machine translation quality metrics.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.