In this study, we critically examine the foundational premise of algorithmic recourse - a process of generating counterfactual action plans (i.e., recourses) assisting individuals to reverse adverse decisions made by AI systems. The assumption underlying algorithmic recourse is that individuals accept and act on recourses that minimize the gap between their current and desired states. This assumption, however, remains empirically unverified. To address this issue, we conducted a user study with 362 participants and assessed whether minimizing the distance function, a metric of the gap between the current and desired states, indeed prompts them to accept and act upon suggested recourses. Our findings reveal a nuanced landscape: participants' acceptance of recourses did not correlate with the recourse distance. Moreover, participants' willingness to act upon recourses peaked at the minimal recourse distance but was otherwise constant. These findings cast doubt on the prevailing assumption of algorithmic recourse research and signal the need to rethink the evaluation functions to pave the way for human-centered recourse generation.
In this research, we examine the minsum flow problem in dynamic path networks where flows are represented as discrete and weighted sets. The minsum flow problem has been widely studied for its relevance in finding evacuation routes during emergencies such as earthquakes. However, previous approaches often assume that individuals are separable and identical, which does not adequately account for the fact that some groups of people, such as families, need to move together and that some groups may be more important than others. To address these limitations, we modify the minsum flow problem to support flows represented as discrete and weighted sets. We also propose a 2-approximation pseudo-polynomial time algorithm to solve this modified problem for path networks with uniform capacity.
In this work, we present a comprehensive three-phase study to examine (1) the effectiveness of large multimodal models (LMMs) in recognizing cultural contexts; (2) the accuracy of their representations of diverse cultures; and (3) their ability to adapt content across cultural boundaries. We first introduce Dalle Street, a large-scale dataset generated by DALL-E 3 and validated by humans, containing 9,935 images of 67 countries and 10 concept classes. We reveal disparities in cultural understanding at the sub-region level with both open-weight (LLaVA) and closed-source (GPT-4V) models on Dalle Street and other existing benchmarks. Next, we assess models' deeper culture understanding by an artifact extraction task and identify over 18,000 artifacts associated with different countries. Finally, we propose a highly composable pipeline, CultureAdapt, to adapt images from culture to culture. Our findings reveal a nuanced picture of the cultural competence of LMMs, highlighting the need to develop culture-aware systems. Dataset and code are available at //github.com/iamshnoo/crossroads
In this article we analyse 3D models of cultural heritage with the aim of answering three main questions: what processes can be put in place to create a FAIR-by-design digital twin of a temporary exhibition? What are the main challenges in applying FAIR principles to 3D data in cultural heritage studies and how are they different from other types of data (e.g. images) from a data management perspective? We begin with a comprehensive literature review touching on: FAIR principles applied to cultural heritage data; representation models; both Object Provenance Information (OPI) and Metadata Record Provenance Information (MRPI), respectively meant as, on the one hand, the detailed history and origin of an object, and - on the other hand - the detailed history and origin of the metadata itself, which describes the primary object (whether physical or digital); 3D models as cultural heritage research data and their creation, selection, publication, archival and preservation. We then describe the process of creating the Aldrovandi Digital Twin, by collecting, storing and modelling data about cultural heritage objects and processes. We detail the many steps from the acquisition of the Digital Cultural Heritage Objects (DCHO), through to the upload of the optimised DCHO onto a web-based framework (ATON), with a focus on open technologies and standards for interoperability and preservation. Using the FAIR Principles for Heritage Library, Archive and Museum Collections as a framework, we look in detail at how the Digital Twin implements FAIR principles at the object and metadata level. We then describe the main challenges we encountered and we summarise what seem to be the peculiarities of 3D cultural heritage data and the possible directions for further research in this field.
This study evaluates the performance of conventional SyN ANTs and learning-based registration methods in the context of pediatric neuroimaging, specifically focusing on intrasubject deformable registration. The comparison involves three approaches: without (NR), with rigid (RR), and with rigid and affine (RAR) initializations. In addition to initialization, performances are evaluated in terms of accuracy, speed, and the impact of age intervals and sex per pair. Data consists of the publicly available MRI scans from the Calgary Preschool dataset, which includes 63 children aged 2-7 years, allowing for 431 registration pairs. We implemented the unsupervised DL framework with a U-Net architecture using DeepReg and it was 5-fold cross-validated. Evaluation includes Dice scores for tissue segmentation from 18 smaller regions obtained by SynthSeg, analysis of log Jacobian determinants, and registration pro-rated training and inference times. Learning-based approaches, with or without linear initializations, exhibit slight superiority over SyN ANTs in terms of Dice scores. Indeed, DL-based implementations with RR and RAR initializations significantly outperform SyN ANTs. Both SyN ANTs and DL-based registration involve parameter optimization, but the choice between these methods depends on the scale of registration: network-based for broader coverage or SyN ANTs for specific structures. Both methods face challenges with larger age intervals due to greater growth changes. The main takeaway is that while DL-based methods show promise with faster and more accurate registrations, SyN ANTs remains robust and generalizable without the need for extensive training, highlighting the importance of method selection based on specific registration needs in the pediatric context. Our code is available at //github.com/neuropoly/pediatric-DL-registration
In this paper, we introduce a novel method for merging the weights of multiple pre-trained neural networks using a genetic algorithm called MeGA. Traditional techniques, such as weight averaging and ensemble methods, often fail to fully harness the capabilities of pre-trained networks. Our approach leverages a genetic algorithm with tournament selection, crossover, and mutation to optimize weight combinations, creating a more effective fusion. This technique allows the merged model to inherit advantageous features from both parent models, resulting in enhanced accuracy and robustness. Through experiments on the CIFAR-10 dataset, we demonstrate that our genetic algorithm-based weight merging method improves test accuracy compared to individual models and conventional methods. This approach provides a scalable solution for integrating multiple pre-trained networks across various deep learning applications. Github is available at: //github.com/YUNBLAK/MeGA-Merging-Multiple-Independently-Trained-Neural-Networks-Based-on-Genetic-Algorithm
With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.
Influenced by the stunning success of deep learning in computer vision and language understanding, research in recommendation has shifted to inventing new recommender models based on neural networks. In recent years, we have witnessed significant progress in developing neural recommender models, which generalize and surpass traditional recommender models owing to the strong representation power of neural networks. In this survey paper, we conduct a systematic review on neural recommender models, aiming to summarize the field to facilitate future progress. Distinct from existing surveys that categorize existing methods based on the taxonomy of deep learning techniques, we instead summarize the field from the perspective of recommendation modeling, which could be more instructive to researchers and practitioners working on recommender systems. Specifically, we divide the work into three types based on the data they used for recommendation modeling: 1) collaborative filtering models, which leverage the key source of user-item interaction data; 2) content enriched models, which additionally utilize the side information associated with users and items, like user profile and item knowledge graph; and 3) context enriched models, which account for the contextual information associated with an interaction, such as time, location, and the past interactions. After reviewing representative works for each type, we finally discuss some promising directions in this field, including benchmarking recommender systems, graph reasoning based recommendation models, and explainable and fair recommendations for social good.