We describe the design of an immersive virtual Cyberball task that included avatar customization, and user feedback on this design. We first created a prototype of an avatar customization template and added it to a Cyberball prototype built in the Unity3D game engine. Then, we conducted in-depth user testing and feedback sessions with 15 Cyberball stakeholders: five naive participants with no prior knowledge of Cyberball and ten experienced researchers with extensive experience using the Cyberball paradigm. We report the divergent perspectives of the two groups on the following design insights; designing for intuitive use, inclusivity, and realistic experiences versus minimalism. Participant responses shed light on how system design problems may contribute to or perpetuate negative experiences when customizing avatars. They also demonstrate the value of considering multiple stakeholders' feedback in the design process for virtual reality, presenting a more comprehensive view in designing future Cyberball prototypes and interactive systems for social science research.
We use Markov categories to develop generalizations of the theory of Markov chains and hidden Markov models in an abstract setting. This comprises characterizations of hidden Markov models in terms of local and global conditional independences as well as existing algorithms for Bayesian filtering and smoothing applicable in all Markov categories with conditionals. We show that these algorithms specialize to existing ones such as the Kalman filter, forward-backward algorithm, and the Rauch-Tung-Striebel smoother when instantiated in appropriate Markov categories. Under slightly stronger assumptions, we also prove that the sequence of outputs of the Bayes filter is itself a Markov chain with a concrete formula for its transition maps. There are two main features of this categorical framework. The first is its generality, as it can be used in any Markov category with conditionals. In particular, it provides a systematic unified account of hidden Markov models and algorithms for filtering and smoothing in discrete probability, Gaussian probability, measure-theoretic probability, possibilistic nondeterminism and others at the same time. The second feature is the intuitive visual representation of information flow in these algorithms in terms of string diagrams.
A typical open-plan office layout is unable to optimally host multiple collocated work activities, personal needs, and situational events, as its space exerts a range of environmental demands on workers in terms of maintaining their acoustic, visual or privacy comfort. As we hypothesise that these demands could be coped by optimising the environmental resources of the architectural layout, we deployed a mobile robotic partition that autonomously manoeuvres between predetermined locations. During a five-weeks in-the-wild study within a real-world open-plan office, we studied how 13 workers adopted four distinct adaptation strategies when sharing the spatiotemporal control of the robotic partition. Based on their logged and self-reported reasoning, we present six initiation regulating factors that determine the appropriateness of each adaptation strategy. This study thus contributes to how future human-building interaction could autonomously improve the experience, comfort, performance, and even the health and wellbeing of multiple workers that share the same workplace.
This work describes a Bayesian framework for reconstructing the boundaries that represent targeted features in an image, as well as the regularity (i.e., roughness vs. smoothness) of these boundaries.This regularity often carries crucial information in many inverse problem applications, e.g., for identifying malignant tissues in medical imaging. We represent the boundary as a radial function and characterize the regularity of this function by means of its fractional differentiability. We propose a hierarchical Bayesian formulation which, simultaneously, estimates the function and its regularity, and in addition we quantify the uncertainties in the estimates. Numerical results suggest that the proposed method is a reliable approach for estimating and characterizing object boundaries in imaging applications, as illustrated with examples from X-ray CT and image inpainting. We also show that our method is robust under various noise types, noise levels, and incomplete data.
In current text-based task-oriented dialogue (TOD) systems, user emotion detection (ED) is often overlooked or is typically treated as a separate and independent task, requiring additional training. In contrast, our work demonstrates that seamlessly unifying ED and TOD modeling brings about mutual benefits, and is therefore an alternative to be considered. Our method consists in augmenting SimpleToD, an end-to-end TOD system, by extending belief state tracking to include ED, relying on a single language model. We evaluate our approach using GPT-2 and Llama-2 on the EmoWOZ benchmark, a version of MultiWOZ annotated with emotions. Our results reveal a general increase in performance for ED and task results. Our findings also indicate that user emotions provide useful contextual conditioning for system responses, and can be leveraged to further refine responses in terms of empathy.
My research centers on the development of context-adaptive AI systems to improve end-user adoption through the integration of technical methods. I deploy these AI systems across various interaction modalities, including user interfaces and embodied agents like robots, to expand their practical applicability. My research unfolds in three key stages: design, development, and deployment. In the design phase, user-centered approaches were used to understand user experiences with AI systems and create design tools for user participation in crafting AI explanations. In the ongoing development stage, a safety-guaranteed AI system for a robot agent was created to automatically provide adaptive solutions and explanations for unforeseen scenarios. The next steps will involve the implementation and evaluation of context-adaptive AI systems in various interaction forms. I seek to prioritize human needs in technology development, creating AI systems that tangibly benefit end-users in real-world applications and enhance interaction experiences.
The main objective of an Information Retrieval system is to provide a user with the most relevant documents to the user's query. To do this, modern IR systems typically deploy a re-ranking pipeline in which a set of documents is retrieved by a lightweight first-stage retrieval process and then re-ranked by a more effective but expensive model. However, the success of a re-ranking pipeline is heavily dependent on the performance of the first stage retrieval, since new documents are not usually identified during the re-ranking stage. Moreover, this can impact the amount of exposure that a particular group of documents, such as documents from a particular demographic group, can receive in the final ranking. For example, the fair allocation of exposure becomes more challenging or impossible if the first stage retrieval returns too few documents from certain groups, since the number of group documents in the ranking affects the exposure more than the documents' positions. With this in mind, it is beneficial to predict the amount of exposure that a group of documents is likely to receive in the results of the first stage retrieval process, in order to ensure that there are a sufficient number of documents included from each of the groups. In this paper, we introduce the novel task of query exposure prediction (QEP). Specifically, we propose the first approach for predicting the distribution of exposure that groups of documents will receive for a given query. Our new approach, called GEP, uses lexical information from individual groups of documents to estimate the exposure the groups will receive in a ranking. Our experiments on the TREC 2021 and 2022 Fair Ranking Track test collections show that our proposed GEP approach results in exposure predictions that are up to 40 % more accurate than the predictions of adapted existing query performance prediction and resource allocation approaches.
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.
Existing recommender systems extract the user preference based on learning the correlation in data, such as behavioral correlation in collaborative filtering, feature-feature, or feature-behavior correlation in click-through rate prediction. However, regretfully, the real world is driven by causality rather than correlation, and correlation does not imply causation. For example, the recommender systems can recommend a battery charger to a user after buying a phone, in which the latter can serve as the cause of the former, and such a causal relation cannot be reversed. Recently, to address it, researchers in recommender systems have begun to utilize causal inference to extract causality, enhancing the recommender system. In this survey, we comprehensively review the literature on causal inference-based recommendation. At first, we present the fundamental concepts of both recommendation and causal inference as the basis of later content. We raise the typical issues that the non-causality recommendation is faced. Afterward, we comprehensively review the existing work of causal inference-based recommendation, based on a taxonomy of what kind of problem causal inference addresses. Last, we discuss the open problems in this important research area, along with interesting future works.
Music streaming services heavily rely on recommender systems to improve their users' experience, by helping them navigate through a large musical catalog and discover new songs, albums or artists. However, recommending relevant and personalized content to new users, with few to no interactions with the catalog, is challenging. This is commonly referred to as the user cold start problem. In this applied paper, we present the system recently deployed on the music streaming service Deezer to address this problem. The solution leverages a semi-personalized recommendation strategy, based on a deep neural network architecture and on a clustering of users from heterogeneous sources of information. We extensively show the practical impact of this system and its effectiveness at predicting the future musical preferences of cold start users on Deezer, through both offline and online large-scale experiments. Besides, we publicly release our code as well as anonymized usage data from our experiments. We hope that this release of industrial resources will benefit future research on user cold start recommendation.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.