Artificial intelligence algorithms are increasingly adopted as decisional aides by public bodies, with the promise of overcoming biases of human decision-makers. At the same time, they may introduce new biases in the human-algorithm interaction. Drawing on psychology and public administration literatures, we investigate two key biases: overreliance on algorithmic advice even in the face of "warning signals" from other sources (automation bias), and selective adoption of algorithmic advice when this corresponds to stereotypes (selective adherence). We assess these via three experimental studies conducted in the Netherlands: In study 1 (N=605), we test automation bias by exploring participants' adherence to an algorithmic prediction compared to an equivalent human-expert prediction. We do not find evidence for automation bias. In study 2 (N=904), we replicate these findings, and also test selective adherence. We find a stronger propensity for adherence when the advice is aligned with group stereotypes, with no significant differences between algorithmic and human-expert advice. In study 3 (N=1,345), we replicate our design with a sample of civil servants. This study was conducted shortly after a major scandal involving public authorities' reliance on an algorithm with discriminatory outcomes (the "childcare benefits scandal"). The scandal is itself illustrative of our theory and patterns diagnosed empirically in our experiment, yet in our study 3, while supporting our prior findings as to automation bias, we do not find patterns of selective adherence. We suggest this is driven by bureaucrats' enhanced awareness of discrimination and algorithmic biases in the aftermath of the scandal. We discuss the implications of our findings for public sector decision-making in the age of automation.
Recent developments in Artificial Intelligence (AI) have fueled the emergence of human-AI collaboration, a setting where AI is a coequal partner. Especially in clinical decision-making, it has the potential to improve treatment quality by assisting overworked medical professionals. Even though research has started to investigate the utilization of AI for clinical decision-making, its potential benefits do not imply its adoption by medical professionals. While several studies have started to analyze adoption criteria from a technical perspective, research providing a human-centered perspective with a focus on AI's potential for becoming a coequal team member in the decision-making process remains limited. Therefore, in this work, we identify factors for the adoption of human-AI collaboration by conducting a series of semi-structured interviews with experts in the healthcare domain. We identify six relevant adoption factors and highlight existing tensions between them and effective human-AI collaboration.
Artificial intelligence (AI) is gaining momentum, and its importance for the future of work in many areas, such as medicine and banking, is continuously rising. However, insights on the effective collaboration of humans and AI are still rare. Typically, AI supports humans in decision-making by addressing human limitations. However, it may also evoke human bias, especially in the form of automation bias as an over-reliance on AI advice. We aim to shed light on the potential to influence automation bias by explainable AI (XAI). In this pre-test, we derive a research model and describe our study design. Subsequentially, we conduct an online experiment with regard to hotel review classifications and discuss first results. We expect our research to contribute to the design and development of safe hybrid intelligence systems.
Ethereum Improvement Proposal (EIP) 1559 was recently implemented to transform Ethereum's transaction fee market. EIP-1559 utilizes an algorithmic update rule with a constant learning rate to estimate a base fee. The base fee reflects prevailing network conditions and hence provides a more reliable oracle for current gas prices. Using on-chain data from the period after its launch, we evaluate the impact of EIP-1559 on the user experience and market performance. Our empirical findings suggest that although EIP-1559 achieves its goals on average, short-term behavior is marked by intense, chaotic oscillations in block sizes (as predicted by our recent theoretical dynamical system analysis [1]) and slow adjustments during periods of demand bursts (e.g., NFT drops). Both phenomena lead to unwanted inter-block variability in mining rewards. To address this issue, we propose an alternative base fee adjustment rule in which the learning rate varies according to an additive increase, multiplicative decrease (AIMD) update scheme. Our simulations show that the latter robustly outperforms the EIP-1559 protocol under various demand scenarios. These results provide evidence that variable learning rate mechanisms may constitute a promising alternative to the default EIP-1559-based format and contribute to the ongoing discussion on the design of more efficient transaction fee markets.
With the rapid development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary theory underlying AR is human visual confusion, which allows users to perceive the real-world scenes and augmented contents (virtual-world scenes) simultaneously by superimposing them together. To achieve good Quality of Experience (QoE), it is important to understand the interaction between two scenarios, and harmoniously display AR contents. However, studies on how this superimposition will influence the human visual attention are lacking. Therefore, in this paper, we mainly analyze the interaction effect between background (BG) scenes and AR contents, and study the saliency prediction problem in AR. Specifically, we first construct a Saliency in AR Dataset (SARD), which contains 450 BG images, 450 AR images, as well as 1350 superimposed images generated by superimposing BG and AR images in pair with three mixing levels. A large-scale eye-tracking experiment among 60 subjects is conducted to collect eye movement data. To better predict the saliency in AR, we propose a vector quantized saliency prediction method and generalize it for AR saliency prediction. For comparison, three benchmark methods are proposed and evaluated together with our proposed method on our SARD. Experimental results demonstrate the superiority of our proposed method on both of the common saliency prediction problem and the AR saliency prediction problem over benchmark methods. Our data collection methodology, dataset, benchmark methods, and proposed saliency models will be publicly available to facilitate future research.
Attention Deficit Hyperactivity Disorder (ADHD) is a behavioral disorder that impacts an individual's education, relationships, career, and ability to acquire fair and just police interrogations. Yet, traditional methods used to diagnose ADHD in children and adults are known to have racial and gender bias. In recent years, diagnostic technology has been studied by both HCI and ML researchers. However, these studies fail to take into consideration racial and gender stereotypes that may impact the accuracy of their results. We highlight the importance of taking race and gender into consideration when creating diagnostic technology for ADHD and provide HCI researchers with suggestions for future studies.
When subjected to a sudden, unanticipated threat, human groups characteristically self-organize to identify the threat, determine potential responses, and act to reduce its impact. Central to this process is the challenge of coordinating information sharing and response activity within a disrupted environment. In this paper, we consider coordination in the context of responses to the 2001 World Trade Center disaster. Using records of communications among 17 organizational units, we examine the mechanisms driving communication dynamics, with an emphasis on the emergence of coordinating roles. We employ relational event models (REMs) to identify the mechanisms shaping communications in each unit, finding a consistent pattern of behavior across units with very different characteristics. Using a simulation-based "knock-out" study, we also probe the importance of different mechanisms for hub formation. Our results suggest that, while preferential attachment and pre-disaster role structure generally contribute to the emergence of hub structure, temporally local conversational norms play a much larger role. We discuss broader implications for the role of microdynamics in driving macroscopic outcomes, and for the emergence of coordination in other settings.
Requirements engineering (RE) activities for Machine Learning (ML) are not well-established and researched in the literature. Many issues and challenges exist when specifying, designing, and developing ML-enabled systems. Adding more focus on RE for ML can help to develop more reliable ML-enabled systems. Based on insights collected from previous work and industrial experiences, we propose a catalogue of 45 concerns to be considered when specifying ML-enabled systems, covering five different perspectives we identified as relevant for such systems: objectives, user experience, infrastructure, model, and data. Examples of such concerns include the execution engine and telemetry for the infrastructure perspective, and explainability and reproducibility for the model perspective. We conducted a focus group session with eight software professionals with experience developing ML-enabled systems to validate the importance, quality and feasibility of using our catalogue. The feedback allowed us to improve the catalogue and confirmed its practical relevance. The main research contribution of this work consists in providing a validated set of concerns grouped into perspectives that can be used by requirements engineers to support the specification of ML-enabled systems.
Recently, numerous studies have demonstrated the presence of bias in machine learning powered decision-making systems. Although most definitions of algorithmic bias have solid mathematical foundations, the corresponding bias detection techniques often lack statistical rigor, especially for non-iid data. We fill this gap in the literature by presenting a rigorous non-parametric testing procedure for bias according to Predictive Rate Parity, a commonly considered notion of algorithmic bias. We adapt traditional asymptotic results for non-parametric estimators to test for bias in the presence of dependence commonly seen in user-level data generated by technology industry applications and illustrate how these approaches can be leveraged for mitigation. We further propose modifications of this methodology to address bias measured through marginal outcome disparities in classification settings and extend notions of predictive rate parity to multi-objective models. Experimental results on real data show the efficacy of the proposed detection and mitigation methods.
Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advance the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior -- the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converge, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often refereed to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of hitherto attempts at handling uncertainty in general and formalizing this distinction in particular.