Google and Apple jointly introduced a digital contact tracing technology and an API called "exposure notification," to help health organizations and governments with contact tracing. The technology and its interplay with security and privacy constraints require investigation. In this study, we examine and analyze the security, privacy, and reliability of the technology with actual and typical scenarios (and expected typical adversary in mind), and quite realistic use cases. We do it in the context of Virginia's COVIDWISE app. This experimental analysis validates the properties of the system under the above conditions, a result that seems crucial for the peace of mind of the exposure notification technology adopting authorities, and may also help with the system's transparency and overall user trust.
Many studies in the field of education analytics have identified student grade point averages (GPA) as an important indicator and predictor of students' final academic outcomes (graduate or halt). And while semester-to-semester fluctuations in GPA are considered normal, significant changes in academic performance may warrant more thorough investigation and consideration, particularly with regards to final academic outcomes. However, such an approach is challenging due to the difficulties of representing complex academic trajectories over an academic career. In this study, we apply a Hidden Markov Model (HMM) to provide a standard and intuitive classification over students' academic-performance levels, which leads to a compact representation of academic-performance trajectories. Next, we explore the relationship between different academic-performance trajectories and their correspondence to final academic success. Based on student transcript data from University of Central Florida, our proposed HMM is trained using sequences of students' course grades for each semester. Through the HMM, our analysis follows the expected finding that higher academic performance levels correlate with lower halt rates. However, in this paper, we identify that there exist many scenarios in which both improving or worsening academic-performance trajectories actually correlate to higher graduation rates. This counter-intuitive finding is made possible through the proposed and developed HMM model.
Contact tracing systems control the spread of disease by discovering the set of people an infectious individual has come into contact with. Students are often mobile and sociable and therefore can contribute to the spread of disease. Controls on the movement of students studying in the UK were put in place during the Covid-19 pandemic, and some restrictions may be necessary over several years. App based digital contact tracing may help ease restrictions by enabling students to make informed decisions and take precautions. However, designing for the end user acceptability of these apps remains under-explored. This study with 22 students from UK Universities (inc. 11 international students) uses a fictional user interface to prompt in-depth interviews on the acceptability of contact tracing tools. We explore intended uptake, usage and compliance with contact tracing apps, finding students are positive, although concerned about privacy, security, and burden of participating.
Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the reach-avoid Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See //sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.
Federated learning is a machine learning paradigm that emerges as a solution to the privacy-preservation demands in artificial intelligence. As machine learning, federated learning is threatened by adversarial attacks against the integrity of the learning model and the privacy of data via a distributed approach to tackle local and global learning. This weak point is exacerbated by the inaccessibility of data in federated learning, which makes harder the protection against adversarial attacks and evidences the need to furtherance the research on defence methods to make federated learning a real solution for safeguarding data privacy. In this paper, we present an extensive review of the threats of federated learning, as well as as their corresponding countermeasures, attacks versus defences. This survey provides a taxonomy of adversarial attacks and a taxonomy of defence methods that depict a general picture of this vulnerability of federated learning and how to overcome it. Likewise, we expound guidelines for selecting the most adequate defence method according to the category of the adversarial attack. Besides, we carry out an extensive experimental study from which we draw further conclusions about the behaviour of attacks and defences and the guidelines for selecting the most adequate defence method according to the category of the adversarial attack. This study is finished leading to meditated learned lessons and challenges.
To fight against infectious diseases (e.g., SARS, COVID-19, Ebola, etc.), government agencies, technology companies and health institutes have launched various contact tracing approaches to identify and notify the people exposed to infection sources. However, existing tracing approaches can lead to severe privacy and security concerns, thereby preventing their secure and widespread use among communities. To tackle these problems, this paper proposes CoAvoid, a decentralized, privacy-preserved contact tracing system that features good dependability and usability. CoAvoid leverages the Google/Apple Exposure Notification (GAEN) API to achieve decent device compatibility and operating efficiency. It utilizes GPS along with Bluetooth Low Energy (BLE) to dependably verify user information. In addition, to enhance privacy protection, CoAvoid applies fuzzification and obfuscation measures to shelter sensitive data, making both servers and users agnostic to information of both low and high-risk populations. The evaluation demonstrates good efficacy and security of CoAvoid. Compared with four state-of-art contact tracing applications, CoAvoid can reduce upload data by at least 90% and simultaneously resist wormhole and replay attacks in various scenarios.
The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals' privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-identifying them. Such de-identification is guaranteed through privacy-preserving techniques. However, de-identified data usually results in loss of information, with a possible impact on data analysis precision and model predictive performance. The main goal is to protect the individuals' privacy while maintaining the interpretability of the data, i.e. its usefulness. Statistical Disclosure Control is an area that is expanding and needs to be explored since there is still no solution that guarantees optimal privacy and utility. This survey focuses on all steps of the de-identification process. We present existing privacy-preserving techniques used in microdata de-identification, privacy measures suitable for several disclosure types and, information loss and predictive performance measures. In this survey, we discuss the main challenges raised by privacy constraints, describe the main approaches to handle these obstacles, review taxonomies of privacy-preserving techniques, provide a theoretical analysis of existing comparative studies, and raise multiple open issues.
Effective visualizations were evaluated to reveal relevant health patterns from multi-sensor real-time wearable devices that recorded vital signs from patients admitted to hospital with COVID-19. Furthermore, specific challenges associated with wearable health data visualizations, such as fluctuating data quality resulting from compliance problems, time needed to charge the device and technical problems are described. As a primary use case, we examined the detection and communication of relevant health patterns visible in the vital signs acquired by the technology. Customized heat maps and bar charts were used to specifically highlight medically relevant patterns in vital signs. A survey of two medical doctors, one clinical project manager and seven health data science researchers was conducted to evaluate the visualization methods. From a dataset of 84 hospitalized COVID-19 patients, we extracted one typical COVID-19 patient history and based on the visualizations showcased the health history of two noteworthy patients. The visualizations were shown to be effective, simple and intuitive in deducing the health status of patients. For clinical staff who are time-constrained and responsible for numerous patients, such visualization methods can be an effective tool to enable continuous acquisition and monitoring of patients' health statuses even remotely.
Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.
As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models is facing efficiency and privacy challenges. Recently, federated learning (FL) has emerged as an alternative solution and continue to thrive in this new reality. Existing FL protocol design has been shown to be vulnerable to adversaries within or outside of the system, compromising data privacy and system robustness. Besides training powerful global models, it is of paramount importance to design FL systems that have privacy guarantees and are resistant to different types of adversaries. In this paper, we conduct the first comprehensive survey on this topic. Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic. We highlight the intuitions, key techniques as well as fundamental assumptions adopted by various attacks and defenses. Finally, we discuss promising future research directions towards robust and privacy-preserving federated learning.
Privacy is a major good for users of personalized services such as recommender systems. When applied to the field of health informatics, privacy concerns of users may be amplified, but the possible utility of such services is also high. Despite availability of technologies such as k-anonymity, differential privacy, privacy-aware recommendation, and personalized privacy trade-offs, little research has been conducted on the users' willingness to share health data for usage in such systems. In two conjoint-decision studies (sample size n=521), we investigate importance and utility of privacy-preserving techniques related to sharing of personal health data for k-anonymity and differential privacy. Users were asked to pick a preferred sharing scenario depending on the recipient of the data, the benefit of sharing data, the type of data, and the parameterized privacy. Users disagreed with sharing data for commercial purposes regarding mental illnesses and with high de-anonymization risks but showed little concern when data is used for scientific purposes and is related to physical illnesses. Suggestions for health recommender system development are derived from the findings.