Is it true that if citizens understand hurricane probabilities, they will make more rational decisions for evacuation? Finding answers to such questions is not straightforward in the literature because the terms judgment and decision making are often used interchangeably. This terminology conflation leads to a lack of clarity on whether people make suboptimal decisions because of inaccurate judgments of information conveyed in visualizations or because they use alternative yet currently unknown heuristics. To decouple judgment from decision making, we review relevant concepts from the literature and present two preregistered experiments (N=601) to investigate if the task (judgment vs. decision making), the scenario (sports vs. humanitarian), and the visualization (quantile dotplots, density plots, probability bars) affect accuracy. While experiment 1 was inconclusive, we found evidence for a difference in experiment 2. Contrary to our expectations and previous research, which found decisions less accurate than their direct-equivalent judgments, our results pointed in the opposite direction. Our findings further revealed that decisions were less vulnerable to status-quo bias, suggesting decision makers may disfavor responses associated with inaction. We also found that both scenario and visualization types can influence peoples judgments and decisions. Although effect sizes are not large and results should be interpreted carefully, we conclude that judgments cannot be safely used as proxy tasks for decision making, and discuss implications for visualization research and beyond.
LGBTQ+ people have received increased attention in HCI research, paralleling a greater emphasis on social justice in recent years. However, there has not been a systematic review of how LGBTQ+ people are researched or discussed in HCI. In this work, we review all research mentioning LGBTQ+ people across the HCI venues of CHI, CSCW, DIS, and TOCHI. Since 2014, we find a linear growth in the number of papers substantially about LGBTQ+ people and an exponential increase in the number of mentions. Research about LGBTQ+ people tends to center experiences of being politicized, outside the norm, stigmatized, or highly vulnerable. LGBTQ+ people are typically mentioned as a marginalized group or an area of future research. We identify gaps and opportunities for (1) research about and (2) the discussion of LGBTQ+ in HCI and provide a dataset to facilitate future Queer HCI research.
Predictive models make mistakes. Hence, there is a need to quantify the uncertainty associated with their predictions. Conformal inference has emerged as a powerful tool to create statistically valid prediction regions around point predictions, but its naive application to regression problems yields non-adaptive regions. New conformal scores, often relying upon quantile regressors or conditional density estimators, aim to address this limitation. Although they are useful for creating prediction bands, these scores are detached from the original goal of quantifying the uncertainty around an arbitrary predictive model. This paper presents a new, model-agnostic family of methods to calibrate prediction intervals for regression problems with local coverage guarantees. Our approach is based on pursuing the coarsest partition of the feature space that approximates conditional coverage. We create this partition by training regression trees and Random Forests on conformity scores. Our proposal is versatile, as it applies to various conformity scores and prediction settings and demonstrates superior scalability and performance compared to established baselines in simulated and real-world datasets. We provide a Python package locart that implements our methods using the standard scikit-learn interface.
In legal decisions, split votes (SV) occur when judges cannot reach a unanimous decision, posing a difficulty for lawyers who must navigate diverse legal arguments and opinions. In high-stakes domains, understanding the alignment of perceived difficulty between humans and AI systems is crucial to build trust. However, existing NLP calibration methods focus on a classifier's awareness of predictive performance, measured against the human majority class, overlooking inherent human label variation (HLV). This paper explores split votes as naturally observable human disagreement and value pluralism. We collect judges' vote distributions from the European Court of Human Rights (ECHR), and present SV-ECHR, a case outcome classification (COC) dataset with SV information. We build a taxonomy of disagreement with SV-specific subcategories. We further assess the alignment of perceived difficulty between models and humans, as well as confidence- and human-calibration of COC models. We observe limited alignment with the judge vote distribution. To our knowledge, this is the first systematic exploration of calibration to human judgements in legal NLP. Our study underscores the necessity for further research on measuring and enhancing model calibration considering HLV in legal decision tasks.
Risk-limiting audits (RLAs) are the established techniques for verifying large elections. While they provide rigorous guarantees of correctness, widespread adoption has been impeded by both efficiency concerns and the fact they offer statistical, rather than absolute, conclusions. We define new families of audits that help to address these issues. Our new audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations rather than a single decision; this can reflect the presence of ambiguous marks, which appear regularly on hand-marked ballots. We show that this simple expedient can offer significant efficiency improvements with only minor changes to existing auditing infrastructure. We establish that these "Bayesian" comparison audits are indeed risk-limiting in the formal sense of (Fuller, Harrison, and Russell, 2022). We then define a new type of post-election audit we call a contested audit. These call for each candidate to provide a cast-vote record table advancing their own claim to victory. We prove that these audits offer remarkable sample efficiency: they guarantee negligible risk with only a constant number of ballot inspections. This is a first for an audit with provable soundness. These results are formulated in a game-based security model that specify quantitative soundness and completeness guarantees. Finally, we observe that these audits provide a direct means to handle contestation of election results affirmed by conventional RLAs.
Online discussion platforms are a vital part of the public discourse in a deliberative democracy. However, how to interpret the outcomes of the discussions on these platforms is often unclear. In this paper, we propose a novel and explainable method for selecting a set of most representative, consistent points of view by combining methods from computational social choice and abstract argumentation. Specifically, we model online discussions as abstract argumentation frameworks combined with information regarding which arguments voters approve of. Based on ideas from approval-based multiwinner voting, we introduce several voting rules for selecting a set of preferred extensions that represents voters' points of view. We compare the proposed methods across several dimensions, theoretically and in numerical simulations, and give clear suggestions on which methods to use depending on the specific situation.
We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, provided that the single-armed relaxed problem is unichain and aperiodic. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Uniform Global Attractor Property (UGAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).
Driven by the rapid ascent of artificial intelligence (AI), organizations are at the epicenter of a seismic shift, facing a crucial question: How can AI be successfully integrated into existing operations? To help answer it, manage expectations and mitigate frustration, this article introduces Computational Management, a systematic approach to task automation for enhancing the ability of organizations to harness AI's potential within existing workflows. Computational Management acts as a bridge between the strategic insights of management science with the analytical rigor of computational thinking. The article offers three easy step-by-step procedures to begin the process of implementing AI within a workflow. Such procedures focus on task (re)formulation, on the assessment of the automation potential of tasks, on the completion of task specification templates for AI selection and adaptation. Included in the article there are manual and automated methods, with prompt suggestions for publicly available LLMs, to complete these three procedures. The first procedure, task (re)formulation, focuses on breaking down work activities into basic units, so they can be completed by one agent, involve a single well-defined action, and produce a distinct outcome. The second, allows the assessment of the granular task and its suitability for automation, using the Task Automation Index to rank tasks based on whether they have standardized input, well-defined rules, repetitiveness, data dependency, and objective outputs. The third, focuses on a task specification template which details information on 16 critical components of tasks, and can be used as a checklist to select or adapt the most suitable AI solution for integration into existing workflows. Computational Management provides a roadmap and a toolkit for humans and AI to thrive together, while enhancing organizational efficiency and innovation.
The advent of large language models marks a revolutionary breakthrough in artificial intelligence. With the unprecedented scale of training and model parameters, the capability of large language models has been dramatically improved, leading to human-like performances in understanding, language synthesizing, and common-sense reasoning, etc. Such a major leap-forward in general AI capacity will change the pattern of how personalization is conducted. For one thing, it will reform the way of interaction between humans and personalization systems. Instead of being a passive medium of information filtering, large language models present the foundation for active user engagement. On top of such a new foundation, user requests can be proactively explored, and user's required information can be delivered in a natural and explainable way. For another thing, it will also considerably expand the scope of personalization, making it grow from the sole function of collecting personalized information to the compound function of providing personalized services. By leveraging large language models as general-purpose interface, the personalization systems may compile user requests into plans, calls the functions of external tools to execute the plans, and integrate the tools' outputs to complete the end-to-end personalization tasks. Today, large language models are still being developed, whereas the application in personalization is largely unexplored. Therefore, we consider it to be the right time to review the challenges in personalization and the opportunities to address them with LLMs. In particular, we dedicate this perspective paper to the discussion of the following aspects: the development and challenges for the existing personalization system, the newly emerged capabilities of large language models, and the potential ways of making use of large language models for personalization.
Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.