In the face of rapidly advancing technologies, evidence of harms they can exacerbate, and insufficient policy to ensure accountability from tech companies, what are HCI opportunities for advancing policymaking of technology? In this paper, we explore challenges and opportunities for tech policymaking through a case study of app-based rideshare driving. We begin with background on rideshare platforms and how they operate. Next, we review literature on algorithmic management about how rideshare drivers actually experience platform features -- often to the detriment of their well-being -- and ways they respond. In light of this, researchers and advocates have called for increased worker protections, thus we turn to rideshare policy and regulation efforts in the U.S. Here, we differentiate the political strategies of platforms with those of drivers to illustrate the conflicting narratives policymakers face when trying to oversee gig work platforms. We reflect that past methods surfacing drivers' experiences may be insufficient for policymaker needs when developing oversight. To address this gap and our original inquiry -- what are HCI opportunities for advancing tech policymaking -- we briefly explore two paths forward for holding tech companies accountable in the rideshare context: (1) data transparency initiatives to enable collective auditing by workers and (2) legal frameworks for holding platforms accountable.
Large Language Models (LLMs) have shown impressive capabilities in complex tasks and interactive environments, yet their creativity remains underexplored. This paper introduces a simulation framework utilizing the game Balderdash to evaluate both the creativity and logical reasoning of LLMs. In Balderdash, players generate fictitious definitions for obscure terms to deceive others while identifying correct definitions. Our framework enables multiple LLM agents to participate in this game, assessing their ability to produce plausible definitions and strategize based on game rules and history. We implemented a centralized game engine featuring various LLMs as participants and a judge LLM to evaluate semantic equivalence. Through a series of experiments, we analyzed the performance of different LLMs, examining metrics such as True Definition Ratio, Deception Ratio, and Correct Guess Ratio. The results provide insights into the creative and deceptive capabilities of LLMs, highlighting their strengths and areas for improvement. Specifically, the study reveals that infrequent vocabulary in LLMs' input leads to poor reasoning on game rules and historical context (//github.com/ParsaHejabi/Simulation-Framework-for-Multi-Agent-Balderdash).
Emotion detection is pivotal in human communication, as it significantly influences behavior, relationships, and decision-making processes. This study concentrates on text-based emotion detection by leveraging the GoEmotions dataset, which annotates Reddit comments with 27 distinct emotions. These emotions are subsequently mapped to Ekman's six basic categories: joy, anger, fear, sadness, disgust, and surprise. We employed a range of models for this task, including six machine learning models, three ensemble models, and a Long Short-Term Memory (LSTM) model to determine the optimal model for emotion detection. Results indicate that the Stacking classifier outperforms other models in accuracy and performance. We also benchmark our models against EmoBERTa, a pre-trained emotion detection model, with our Stacking classifier proving more effective. Finally, the Stacking classifier is deployed via a Streamlit web application, underscoring its potential for real-world applications in text-based emotion analysis.
We explore some of the existing research in HCI around technology for older adults and examine the role of LLMs in enhancing it. We also discuss the digital divide and emphasize the need for inclusive technology design. At the same time, we also surface concerns regarding privacy, security, and the accuracy of information provided by LLMs, alongside the importance of user-centered design to make technology accessible and effective for the elderly. We show the transformative possibilities of LLM-supported interactions at the intersection of aging, technology, and human-computer interaction, advocating for further research and development in this area.
Due to the enormous potential for economic profit offered by artificial intelligence (AI) servers, the field of cybersecurity has the potential to emerge as a prominent arena for competition among corporations and governments on a global scale. One of the prospective applications that stands to gain from the utilization of AI technology is the advancement in the field of cybersecurity. To this end, this paper provides an overview of the possible risks that the physical layer may encounter in the context of 6G Non-Terrestrial Networks (NTN). With the objective of showcasing the effectiveness of cutting-edge AI technologies in bolstering physical layer security, this study reviews the most foreseeable design strategies associated with the integration of edge AI in the realm of 6G NTN. The findings of this paper provide some insights and serve as a foundation for future investigations aimed at enhancing the physical layer security of edge servers/devices in the next generation of trustworthy 6G telecommunication networks.
Social robots are required not only to understand human intentions but also to effectively communicate their intentions or own internal states to users. This study explores the use of sonification to provide explicit auditory feedback, enhancing mutual understanding in HRI. We introduce a novel sonification approach that conveys the robot's internal state, linked to its perception of nearby individuals and their interaction intentions. The approach is evaluated through a two-fold user study: an online video-based survey with $26$ participants and live experiments with $10$ participants. Results indicate that while sonification improves the robot's expressivity and communication effectiveness, the design of the auditory feedback needs refinement to enhance user experience. Participants found the auditory cues useful but described the sounds as uninteresting and unpleasant. These findings underscore the importance of carefully designed auditory feedback in developing more effective and engaging HRI systems.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg). However, the recent MedISeg publications usually focus on presentations of the major contributions (e.g., network architectures, training strategies, and loss functions) while unwittingly ignoring some marginal implementation details (also known as "tricks"), leading to a potential problem of the unfair experimental result comparisons. In this paper, we collect a series of MedISeg tricks for different model implementation phases (i.e., pre-training model, data pre-processing, data augmentation, model implementation, model inference, and result post-processing), and experimentally explore the effectiveness of these tricks on the consistent baseline models. Compared to paper-driven surveys that only blandly focus on the advantages and limitation analyses of segmentation models, our work provides a large number of solid experiments and is more technically operable. With the extensive experimental results on both the representative 2D and 3D medical image datasets, we explicitly clarify the effect of these tricks. Moreover, based on the surveyed tricks, we also open-sourced a strong MedISeg repository, where each of its components has the advantage of plug-and-play. We believe that this milestone work not only completes a comprehensive and complementary survey of the state-of-the-art MedISeg approaches, but also offers a practical guide for addressing the future medical image processing challenges including but not limited to small dataset learning, class imbalance learning, multi-modality learning, and domain adaptation. The code has been released at: //github.com/hust-linyi/MedISeg
With the advent of 5G commercialization, the need for more reliable, faster, and intelligent telecommunication systems are envisaged for the next generation beyond 5G (B5G) radio access technologies. Artificial Intelligence (AI) and Machine Learning (ML) are not just immensely popular in the service layer applications but also have been proposed as essential enablers in many aspects of B5G networks, from IoT devices and edge computing to cloud-based infrastructures. However, most of the existing surveys in B5G security focus on the performance of AI/ML models and their accuracy, but they often overlook the accountability and trustworthiness of the models' decisions. Explainable AI (XAI) methods are promising techniques that would allow system developers to identify the internal workings of AI/ML black-box models. The goal of using XAI in the security domain of B5G is to allow the decision-making processes of the security of systems to be transparent and comprehensible to stakeholders making the systems accountable for automated actions. In every facet of the forthcoming B5G era, including B5G technologies such as RAN, zero-touch network management, E2E slicing, this survey emphasizes the role of XAI in them and the use cases that the general users would ultimately enjoy. Furthermore, we presented the lessons learned from recent efforts and future research directions on top of the currently conducted projects involving XAI.
Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.