This paper introduces Ethics Pathways, a design activity aimed at understanding HCI and design researchers' ethics engagements and flows during their research process. Despite a strong ethical commitment in these fields, challenges persist in grasping the complexity of researchers' engagement with ethics -- practices conducted to operationalize ethics -- in situated institutional contexts. Ethics Pathways, developed through six playtesting sessions, offers a design approach to understanding the complexities of researchers' past ethics engagements in their work. This activity involves four main tasks: recalling ethical incidents; describing stakeholders involved in the situation; recounting their actions or speculative alternatives; and reflection and emotion walk-through. The paper reflects on the role of design decisions and facilitation strategies in achieving these goals. The design activity contributes to the discourse on ethical HCI research by conceptualizing ethics engagement as a part of ongoing research processing, highlighting connections between individual affective experiences, social interactions across power differences, and institutional goals.
Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and drew critical media coverage. The paper withheld critical methodology steps and most inputs needed to reproduce results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lags behind (i) human designers, (ii) a well-known algorithm (Simulated Annealing), and (iii) generally-available commercial software, while being slower; and in a 2023 open research contest, RL methods weren't in top 5. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in conduct, analysis and reporting. Before publishing, Google rebuffed internal allegations of fraud, which still stand. We note policy implications and conclusions for chip design.
We present a comprehensive study of answer quality evaluation in Retrieval-Augmented Generation (RAG) applications using vRAG-Eval, a novel grading system that is designed to assess correctness, completeness, and honesty. We further map the grading of quality aspects aforementioned into a binary score, indicating an accept or reject decision, mirroring the intuitive "thumbs-up" or "thumbs-down" gesture commonly used in chat applications. This approach suits factual business settings where a clear decision opinion is essential. Our assessment applies vRAG-Eval to two Large Language Models (LLMs), evaluating the quality of answers generated by a vanilla RAG application. We compare these evaluations with human expert judgments and find a substantial alignment between GPT-4's assessments and those of human experts, reaching 83% agreement on accept or reject decisions. This study highlights the potential of LLMs as reliable evaluators in closed-domain, closed-ended settings, particularly when human evaluations require significant resources.
Advancements in generative AI have broadened the potential applications of Large Language Models (LLMs) in the development of autonomous agents. Achieving true autonomy requires accumulating and updating knowledge gained from interactions with the environment and effectively utilizing it. Current LLM-based approaches leverage past experiences using a full history of observations, summarization or retrieval augmentation. However, these unstructured memory representations do not facilitate the reasoning and planning essential for complex decision-making. In our study, we introduce AriGraph, a novel method wherein the agent constructs a memory graph that integrates semantic and episodic memories while exploring the environment. This graph structure facilitates efficient associative retrieval of interconnected concepts, relevant to the agent's current state and goals, thus serving as an effective environmental model that enhances the agent's exploratory and planning capabilities. We demonstrate that our Ariadne LLM agent, equipped with this proposed memory architecture augmented with planning and decision-making, effectively handles complex tasks on a zero-shot basis in the TextWorld environment. Our approach markedly outperforms established methods such as full-history, summarization, and Retrieval-Augmented Generation in various tasks, including the cooking challenge from the First TextWorld Problems competition and novel tasks like house cleaning and puzzle Treasure Hunting.
This paper introduces the concept of Language-Guided World Models (LWMs) -- probabilistic models that can simulate environments by reading texts. Agents equipped with these models provide humans with more extensive and efficient control, allowing them to simultaneously alter agent behaviors in multiple tasks via natural verbal communication. In this work, we take initial steps in developing robust LWMs that can generalize to compositionally novel language descriptions. We design a challenging world modeling benchmark based on the game of MESSENGER (Hanjie et al., 2021), featuring evaluation settings that require varying degrees of compositional generalization. Our experiments reveal the lack of generalizability of the state-of-the-art Transformer model, as it offers marginal improvements in simulation quality over a no-text baseline. We devise a more robust model by fusing the Transformer with the EMMA attention mechanism (Hanjie et al., 2021). Our model substantially outperforms the Transformer and approaches the performance of a model with an oracle semantic parsing and grounding capability. To demonstrate the practicality of this model in improving AI safety and transparency, we simulate a scenario in which the model enables an agent to present plans to a human before execution, and to revise plans based on their language feedback.
Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, inscrutable AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
This paper considers a trajectory planning problem for a robot navigating complex terrains, which arises in applications ranging from autonomous mining vehicles to planetary rovers. The problem seeks to find a low-cost dynamically feasible trajectory for the robot. The problem is challenging as it requires solving a non-linear optimization problem that often has many local minima due to the complex terrain. To address the challenge, we propose a method called Pareto-optimal Warm-started Trajectory Optimization (PWTO) that attempts to combine the benefits of graph search and trajectory optimization, two very different approaches to planning. PWTO first creates a state lattice using simplified dynamics of the robot and leverages a multi-objective graph search method to obtain a set of paths. Each of the paths is then used to warm-start a local trajectory optimization process, so that different local minima are explored to find a globally low-cost solution. In our tests, the solution cost computed by PWTO is often less than half of the costs computed by the baselines. In addition, we verify the trajectories generated by PWTO in Gazebo simulation in complex terrains with both wheeled and quadruped robots. The code of this paper is open sourced and can be found at //github.com/rap-lab-org/public_pwto.
Software developers commonly rely on platforms like Stack Overflow for problem-solving and learning. However, academic research is an untapped resource that could greatly benefit industry practitioners. The challenge lies in connecting the innovative insights from academia to real-world problems faced by developers. This project introduces ResearchBot, a tool designed to bridge this academia-industry gap. ResearchBot employs a modular approach, encompassing understanding questions, curating queries to obtain relevant papers in the CrossRef repository, summarizing paper content and finally answering user questions based on paper summaries. The core objective of ResearchBot is to democratize access to academic knowledge for industry professionals. By providing concise summaries of cutting-edge research directly in response to SE-related questions, ResearchBot facilitates the application of academic insights to practical contexts. Ultimately, it aims to bridge the gap between academia and industry, using research evidence to support learning and decision-making in software development.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.
With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.