While the excitement around the capabilities of technological advancements is giving rise to new AI-based writing assistants, the overarching ecosystem plays a crucial role in how they are adopted in educational practice. In this paper, we point to key ecological aspects for consideration. We draw insights from extensive research integrated with practice on a writing feedback tool over 9 years at a university, and we highlight potential risks when these are overlooked. It informs the design of educational writing support tools to be better aligned within broader contexts to balance innovation with practical impact.
The importance of addressing fairness and bias in artificial intelligence (AI) systems cannot be over-emphasized. Mainstream media has been awashed with news of incidents around stereotypes and bias in many of these systems in recent years. In this survey, we fill a gap with regards to the minimal study of fairness and bias in Large Multimodal Models (LMMs) compared to Large Language Models (LLMs), providing 50 examples of datasets and models along with the challenges affecting them; we identify a new category of quantifying bias (preuse), in addition to the two well-known ones in the literature: intrinsic and extrinsic; we critically discuss the various ways researchers are addressing these challenges. Our method involved two slightly different search queries on Google Scholar, which revealed that 33,400 and 538,000 links are the results for the terms "Fairness and bias in Large Multimodal Models" and "Fairness and bias in Large Language Models", respectively. We believe this work contributes to filling this gap and providing insight to researchers and other stakeholders on ways to address the challenge of fairness and bias in multimodal A!.
We address the challenges and opportunities in the development of knowledge systems for Sanskrit, with a focus on question answering. By proposing a framework for the automated construction of knowledge graphs, introducing annotation tools for ontology-driven and general-purpose tasks, and offering a diverse collection of web-interfaces, tools, and software libraries, we have made significant contributions to the field of computational Sanskrit. These contributions not only enhance the accessibility and accuracy of Sanskrit text analysis but also pave the way for further advancements in knowledge representation and language processing. Ultimately, this research contributes to the preservation, understanding, and utilization of the rich linguistic information embodied in Sanskrit texts.
We present ConvoCache, a conversational caching system that solves the problem of slow and expensive generative AI models in spoken chatbots. ConvoCache finds a semantically similar prompt in the past and reuses the response. In this paper we evaluate ConvoCache on the DailyDialog dataset. We find that ConvoCache can apply a UniEval coherence threshold of 90% and respond to 89% of prompts using the cache with an average latency of 214ms, replacing LLM and voice synthesis that can take over 1s. To further reduce latency we test prefetching and find limited usefulness. Prefetching with 80% of a request leads to a 63% hit rate, and a drop in overall coherence. ConvoCache can be used with any chatbot to reduce costs by reducing usage of generative AI by up to 89%.
The use of Virtual Reality (VR) technologies has been extensively researched in surgical and anatomical education. VR provides a lifelike and interactive environment where healthcare providers can practice and refresh their skills in a safe environment. VR has been shown to be as effective as traditional medical education teaching methods, with the potential to provide more cost-effective and convenient means of curriculum delivery, especially in rural and remote areas or in environments with limited access to hands-on training. In this sense, VR offers the potential to be used to support resuscitation training for healthcare providers such as the Neonatal Resuscitation Program (NRP). The NRP program is an evidence-based and standardized approach for training healthcare providers on the resuscitation of the newborn. In this article, we describe a VR simulation environment that was designed and developed to refresh the skills of NRP providers. To validate this platform, we compared the VR-NRP simulation with exposure to 360-degree immersive video. We found that both VR technologies were positively viewed by healthcare professionals and performed very similarly to each other. However, the VR simulation provided a significantly increased feeling of presence. Furthermore, participants found the VR simulation more useful, leading to improved experiential learning outcomes. Also, participants using VR simulation reported higher confidence in certain NRP skills, such as proper mask placement and newborn response evaluation. This research represents a step forward in understanding how VR and related extended reality (XR) technologies can be applied for effective, immersive medical education, with potential benefits for remote and rural healthcare providers.
AI assistance tools such as ChatGPT, Copilot, and Gemini have dramatically impacted the nature of software development in recent years. Numerous studies have studied the positive benefits that practitioners have achieved from using these tools in their work. While there is a growing body of knowledge regarding the usability aspects of leveraging AI tools, we still lack concrete details on the issues that organizations and practitioners need to consider should they want to explore increasing adoption or use of AI tools. In this study, we conducted a mixed methods study involving interviews with 26 industry practitioners and 395 survey respondents. We found that there are several motives and challenges that impact individuals and organizations and developed a theory of AI Tool Adoption. For example, we found creating a culture of sharing of AI best practices and tips as a key motive for practitioners' adopting and using AI tools. In total, we identified 2 individual motives, 4 individual challenges, 3 organizational motives, and 3 organizational challenges, and 3 interleaved relationships. The 3 interleaved relationships act in a push-pull manner where motives pull practitioners to increase the use of AI tools and challenges push practitioners away from using AI tools.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.
With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.
We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.
Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.