Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse today. The opaque nature of the algorithms these platforms use to curate content raises societal questions. Prior studies have used black-box methods to show that these algorithms can lead to biased or discriminatory outcomes. However, existing auditing methods face fundamental limitations because they function independent of the platforms. Concerns of potential harm have prompted proposal of legislation in both the U.S. and the E.U. to mandate a new form of auditing where vetted external researchers get privileged access to social media platforms. Unfortunately, to date there have been no concrete technical proposals to provide such auditing, because auditing at scale risks disclosure of users' private data and platforms' proprietary algorithms. We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation. Our first contribution is to enumerate the challenges of existing auditing methods to implement these policies at scale. Second, we suggest that limited, privileged access to relevance estimators is the key to enabling generalizable platform-supported auditing by external researchers. Third, we show platform-supported auditing need not risk user privacy nor disclosure of platforms' business interests by proposing an auditing framework that protects against these risks. For a particular fairness metric, we show that ensuring privacy imposes only a small constant factor increase (6.34x as an upper bound, and 4x for typical parameters) in the number of samples required for accurate auditing. Our technical contributions, combined with ongoing legal and policy efforts, can enable public oversight into how social media platforms affect individuals and society by moving past the privacy-vs-transparency hurdle.
With the accelerated adoption of end-to-end encryption, there is an opportunity to re-architect security and anti-abuse primitives in a manner that preserves new privacy expectations. In this paper, we consider two novel protocols for on-device blocklisting that allow a client to determine whether an object (e.g., URL, document, image, etc.) is harmful based on threat information possessed by a so-called remote enforcer in a way that is both privacy-preserving and trustworthy. Our protocols leverage a unique combination of private set intersection to promote privacy, cryptographic hashes to ensure resilience to false positives, cryptographic signatures to improve transparency, and Merkle inclusion proofs to ensure consistency and auditability. We benchmark our protocols -- one that is time-efficient, and the other space-efficient -- to demonstrate their practical use for applications such as email, messaging, storage, and other applications. We also highlight remaining challenges, such as privacy and censorship tensions that exist with logging or reporting. We consider our work to be a critical first step towards enabling complex, multi-stakeholder discussions on how best to provide on-device protections.
Mobile manipulator platforms, like the Stretch RE1 robot, make the promise of in-home robotic assistance feasible. For people with severe physical limitations, like those with quadriplegia, the ability to tele-operate these robots themselves means that they can perform physical tasks they cannot otherwise do themselves, thereby increasing their level of independence. In order for users with physical limitations to operate these robots, their interfaces must be accessible and cater to the specific needs of all users. As physical limitations vary amongst users, it is difficult to make a single interface that will accommodate all users. Instead, such interfaces should be customizable to each individual user. In this paper we explore the value of customization of a browser-based interface for tele-operating the Stretch RE1 robot. More specifically, we evaluate the usability and effectiveness of a customized interface in comparison to the default interface configurations from prior work. We present a user study involving participants with motor impairments (N=10) and without motor impairments, who could serve as a caregiver, (N=13) that use the robot to perform mobile manipulation tasks in a real kitchen environment. Our study demonstrates that no single interface configuration satisfies all users' needs and preferences. Users perform better when using the customized interface for navigation, but not for manipulation due to higher complexity of learning to manipulate through the robot. All participants are able to use the robot to complete all tasks and participants with motor impairments believe that having the robot in their home would make them more independent.
This paper introduces the Saudi Privacy Policy Dataset, a diverse compilation of Arabic privacy policies from various sectors in Saudi Arabia, annotated according to the 10 principles of the Personal Data Protection Law (PDPL); the PDPL was established to be compatible with General Data Protection Regulation (GDPR); one of the most comprehensive data regulations worldwide. Data were collected from multiple sources, including the Saudi Central Bank, the Saudi Arabia National United Platform, the Council of Health Insurance, and general websites using Google and Wikipedia. The final dataset includes 1,000 websites belonging to 7 sectors, 4,638 lines of text, 775,370 tokens, and a corpus size of 8,353 KB. The annotated dataset offers significant reuse potential for assessing privacy policy compliance, benchmarking privacy practices across industries, and developing automated tools for monitoring adherence to data protection regulations. By providing a comprehensive and annotated dataset of privacy policies, this paper aims to facilitate further research and development in the areas of privacy policy analysis, natural language processing, and machine learning applications related to privacy and data protection, while also serving as an essential resource for researchers, policymakers, and industry professionals interested in understanding and promoting compliance with privacy regulations in Saudi Arabia.
Despite their prevalence in eHealth applications for behavior change, persuasive messages tend to have small effects on behavior. Conditions or states (e.g., confidence, knowledge, motivation) and characteristics (e.g., gender, age, personality) of persuadees are two promising components for more effective algorithms for choosing persuasive messages. However, it is not yet sufficiently clear how well considering these components allows one to predict behavior after persuasive attempts, especially in the long run. Since collecting data for many algorithm components is costly and places a burden on users, a better understanding of the impact of individual components in practice is welcome. This can help to make an informed decision on which components to use. We thus conducted a longitudinal study in which a virtual coach persuaded 671 daily smokers to do preparatory activities for quitting smoking and becoming more physically active, such as envisioning one's desired future self. Based on the collected data, we designed a Reinforcement Learning (RL)-approach that considers current and future states to maximize the effort people spend on their activities. Using this RL-approach, we found, based on leave-one-out cross-validation, that considering states helps to predict both behavior and future states. User characteristics and especially involvement in the activities, on the other hand, only help to predict behavior if used in combination with states rather than alone. We see these results as supporting the use of states and involvement in persuasion algorithms. Our dataset is available online.
The present study investigates the role of source characteristics, the quality of evidence, and prior beliefs of the topic in adult readers' credibility evaluations of short health-related social media posts. The researchers designed content for the posts concerning five health topics by manipulating the source characteristics (source's expertise, gender, and ethnicity), the accuracy of the claims, and the quality of evidence (research evidence, testimony, consensus, and personal experience) of the posts. After this, accurate and inaccurate social media posts varying in the other manipulated aspects were programmatically generated. The crowdworkers (N = 844) recruited from two platforms were asked to evaluate the credibility of up to ten social media posts, resulting in 8380 evaluations. Before credibility evaluation, participants' prior beliefs on the topics of the posts were assessed. The results showed that prior belief consistency and the source's expertise affected the perceived credibility of the accurate and inaccurate social media posts the most after controlling for the topic of the post and the crowdworking platform. In contrast, the quality of evidence supporting the health claim mattered relatively little. The source's gender and ethnicity did not have any effect. The results are discussed in terms of first- and second-hand evaluation strategies.
Biometric data contains distinctive human traits such as facial features or gait patterns. The use of biometric data permits an individuation so exact that the data is utilized effectively in identification and authentication systems. But for this same reason, privacy protections become indispensably necessary. Privacy protection is extensively afforded by the technique of anonymization. Anonymization techniques obfuscate or remove the sensitive personal data to achieve high levels of anonymity. However, the effectiveness of anonymization relies, in equal parts, on the effectiveness of the methods employed to evaluate anonymization performance. In this paper, we assess the state-of-the-art methods used to evaluate the performance of anonymization techniques for facial images and gait patterns. We demonstrate that the state-of-the-art evaluation methods have serious and frequent shortcomings. In particular, we find that the underlying assumptions of the state-of-the-art are quite unwarranted. When a method evaluating the performance of anonymization assumes a weak adversary or a weak recognition scenario, then the resulting evaluation will very likely be a gross overestimation of the anonymization performance. Therefore, we propose a stronger adversary model which is alert to the recognition scenario as well as to the anonymization scenario. Our adversary model implements an appropriate measure of anonymization performance. We improve the selection process for the evaluation dataset, and we reduce the numbers of identities contained in the dataset while ensuring that these identities remain easily distinguishable from one another. Our novel evaluation methodology surpasses the state-of-the-art because we measure worst-case performance and so deliver a highly reliable evaluation of biometric anonymization techniques.
The security and privacy of refugee communities have emerged as pressing concerns in the context of increasing global migration. The Rohingya refugees are a stateless Muslim minority group in Myanmar who were forced to flee their homes after conflict broke out, with many fleeing to neighbouring countries and ending up in refugee camps, such as in Bangladesh. However, others migrated to Malaysia and those who arrive there live within the community as urban refugees. However, the Rohingya in Malaysia are not legally recognized and have limited and restricted access to public resources such as healthcare and education. This means they face security and privacy challenges, different to other refugee groups, which are often compounded by this lack of recognition, social isolation and lack of access to vital resources. This paper discusses the implications of security and privacy of the Rohingya refugees, focusing on available and accessible technological assistance, uncovering the heightened need for a human-centered approach to design and implementation of solutions that factor in these requirements. Overall, the discussions and findings presented in this paper on the security and privacy of the Rohingya provides a valuable resource for researchers, practitioners and policymakers in the wider HCI community.
The California Consumer Privacy Act (CCPA) provides California residents with a range of enhanced privacy protections and rights. Our research investigated the extent to which Android app developers comply with the provisions of the CCPA that require them to provide consumers with accurate privacy notices and respond to "verifiable consumer requests" (VCRs) by disclosing personal information that they have collected, used, or shared about consumers for a business or commercial purpose. We compared the actual network traffic of 109 apps that we believe must comply with the CCPA to the data that apps state they collect in their privacy policies and the data contained in responses to "right to know" requests that we submitted to the app's developers. Of the 69 app developers who substantively replied to our requests, all but one provided specific pieces of personal data (as opposed to only categorical information). However, a significant percentage of apps collected information that was not disclosed, including identifiers (55 apps, 80%), geolocation data (21 apps, 30%), and sensory data (18 apps, 26%) among other categories. We discuss improvements to the CCPA that could help app developers comply with "right to know" requests and other related regulations.
The purpose of this paper is to describe the development of a synthetic population dataset that is open and realistic and can be used to facilitate understanding the cartographic process and contextualizing the cartographic artifacts. We first discuss an optimization model that is designed to construct the synthetic population by minimizing the difference between the summarized information of the synthetic populations and the statistics published in census data tables. We then illustrate how the synthetic population dataset can be used to contextualize maps made using privacy-preserving census data. Two counties in Ohio are used as case studies.
Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.