In the face of uncertainty, the ability to *seek information* is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines 1) an *uncertainty-aware simulation approach* which enables the model to simulate possible future scenarios and how likely they are to occur, 2) *uncertainty-based rewards* motivated by information gain which incentivizes the model to seek information, and 3) a *reward propagation scheme* to select the optimal question to ask in a way that maximizes the expected reward. In experiments on medical diagnosis, troubleshooting, and the `20 Questions` game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion across multiple LLMs compared with direct prompting and also improves efficiency (i.e., the number of questions needed to complete the task). Our code has been released [here](//github.com/zhiyuanhubj/UoT)
Traversability estimation in rugged, unstructured environments remains a challenging problem in field robotics. Often, the need for precise, accurate traversability estimation is in direct opposition to the limited sensing and compute capability present on affordable, small-scale mobile robots. To address this issue, we present a novel method to learn [u]ncertainty-aware [n]avigation features from high-fidelity scans of [real]-world environments (UNRealNet). This network can be deployed on-robot to predict these high-fidelity features using input from lower-quality sensors. UNRealNet predicts dense, metric-space features directly from single-frame lidar scans, thus reducing the effects of occlusion and odometry error. Our approach is label-free, and is able to produce traversability estimates that are robot-agnostic. Additionally, we can leverage UNRealNet's predictive uncertainty to both produce risk-aware traversability estimates, and refine our feature predictions over time. We find that our method outperforms traditional local mapping and inpainting baselines by up to 40%, and demonstrate its efficacy on multiple legged platforms.
Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, vertebrae detection serving as the first step, is affected by the pathology and mental implants. Thus, incorrect detections cause biased patches before segmentation, then lead to inconsistent labeling and segmentation. In our work, motivated by the perspective of instance segmentation, we try to label individual and complete binary masks to address this limitation. Specifically, a contour-based network is proposed based on Structural Low-Rank Descriptors for shape consistency, termed SLoRD. These contour descriptors are acquired in a data-driven manner in advance. For a more precise representation of contour descriptors, we adopt the spherical coordinate system and devise the spherical centroid. Besides, the contour loss is designed to impose explicit consistency constraints, facilitating regressed contour points close to vertebral boundaries. Quantitative and qualitative evaluations on VerSe 2019 demonstrate the superior performance of our framework over other single-stage and multi-stage state-of-the-art (SOTA) methods.
Purpose:Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods:This scoping review aims to elucidate how biases related to GAI in HEIs have been researched and discussed in recent academic publications. We categorized the potential societal biases that GAI might cause in the field of higher education. Our review includes articles written in English, Chinese, and Japanese across four main databases, focusing on GAI usage in higher education and bias. Findings:Our findings reveal that while there is meaningful scholarly discussion around bias and discrimination concerning LLMs in the AI field, most articles addressing higher education approach the issue superficially. Few articles identify specific types of bias under different circumstances, and there is a notable lack of empirical research. Most papers in our review focus primarily on educational and research fields related to medicine and engineering, with some addressing English education. However, there is almost no discussion regarding the humanities and social sciences. Additionally, a significant portion of the current discourse is in English and primarily addresses English-speaking contexts. Originality/Value:To the best of our knowledge, our study is the first to summarize the potential societal biases in higher education. This review highlights the need for more in-depth studies and empirical work to understand the specific biases that GAI might introduce or amplify in educational settings, guiding the development of more ethical AI applications in higher education.
Automated Vehicle (AV) validation based on simulated testing requires unbiased evaluation and high efficiency. One effective solution is to increase the exposure to risky rare events while reweighting the probability measure. However, characterizing the distribution of risky events is particularly challenging due to the paucity of samples and the temporality of continuous scenario variables. To solve it, we devise a method to represent, generate, and reweight the distribution of risky rare events. We decompose the temporal evolution of continuous variables into distribution components based on conditional probability. By introducing the Risk Indicator Function, the distribution of risky rare events is theoretically precipitated out of naturalistic driving distribution. This targeted distribution is practically generated via Normalizing Flow, which achieves exact and tractable probability evaluation of intricate distribution. The rare event distribution is then demonstrated as the advantageous Importance Sampling distribution. We also promote the technique of temporal Importance Sampling. The combined method, named as TrimFlow, is executed to estimate the collision rate of Car-following scenarios as a tentative practice. The results showed that sampling background vehicle maneuvers from rare event distribution could evolve testing scenarios to hazardous states. TrimFlow reduced 86.1% of tests compared to generating testing scenarios according to their exposure in the naturalistic driving environment. In addition, the TrimFlow method is not limited to one specific type of functional scenario.
Large Language Models (LLMs) have demonstrated remarkable capabilities across various applications, fundamentally reshaping the landscape of natural language processing (NLP) research. However, recent evaluation frameworks often rely on the output probabilities of LLMs for predictions, primarily due to computational constraints, diverging from real-world LLM usage scenarios. While widely employed, the efficacy of these probability-based evaluation strategies remains an open research question. This study aims to scrutinize the validity of such probability-based evaluation methods within the context of using LLMs for Multiple Choice Questions (MCQs), highlighting their inherent limitations. Our empirical investigation reveals that the prevalent probability-based evaluation method inadequately aligns with generation-based prediction. Furthermore, current evaluation frameworks typically assess LLMs through predictive tasks based on output probabilities rather than directly generating responses, owing to computational limitations. We illustrate that these probability-based approaches do not effectively correspond with generative predictions. The outcomes of our study can enhance the understanding of LLM evaluation methodologies and provide insights for future research in this domain.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.
Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of past and current baseline approaches and an in-depth study of recent advancements in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning applications is proposed, elaborating on different applications in more depth. Architectures and datasets used in these applications are also discussed, along with their evaluation metrics. Last, main issues are highlighted separately for each domain along with their possible future research directions.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.