Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deep fake detectors. However, recent studies have shown that the current audio deep fake models fall short of this desideratum. In this paper we show that pretrained self-supervised representations followed by a simple logistic regression classifier achieve strong generalisation capabilities, reducing the equal error rate from 30% to 8% on the newly introduced In-the-Wild dataset. Importantly, this approach also produces considerably better calibrated models when compared to previous approaches. This means that we can trust our model's predictions more and use these for downstream tasks, such as uncertainty estimation. In particular, we show that the entropy of the estimated probabilities provides a reliable way of rejecting uncertain samples and further improving the accuracy.
We propose a method to detect model misspecifications in nonlinear causal additive and potentially heteroscedastic noise models. We aim to identify predictor variables for which we can infer the causal effect even in cases of such misspecification. We develop a general framework based on knowledge of the multivariate observational data distribution and we then propose an algorithm for finite sample data, discuss its asymptotic properties, and illustrate its performance on simulated and real data.
Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important features for predicting an outcome. The history of feature attribution methods can be traced back to General Additive Models (GAMs), which extend linear regression models by incorporating non-linear relationships between dependent and independent variables. In recent years, gradient-based methods and surrogate models have been applied to unravel complex Artificial Intelligence (AI) systems, but these methods have limitations. GAMs tend to achieve lower accuracy, gradient-based methods can be difficult to interpret, and surrogate models often suffer from stability and fidelity issues. Furthermore, most existing methods do not consider users' contexts, which can significantly influence their preferences. To address these limitations and advance the current state-of-the-art, we define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA). Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction. Additionally, CA-FATA formulates feature attribution as an argumentation procedure, and each computation has explicit semantics, which makes it inherently interpretable. CA-FATA also easily integrates side information, such as users' contexts, resulting in more accurate predictions.
We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by raising the cost of using them, e.g., by disabling copy-pasting. Secondary analyses give further insight into LLM use and its prevention: LLM use yields high-quality but homogeneous responses, which may harm research concerned with human (rather than model) behavior and degrade future models trained with crowdsourced data. At the same time, preventing LLM use may be at odds with obtaining high-quality responses; e.g., when requesting workers not to use LLMs, summaries contained fewer keywords carrying essential information. Our estimates will likely change as LLMs increase in popularity or capabilities, and as norms around their usage change. Yet, understanding the co-evolution of LLM-based tools and users is key to maintaining the validity of research done using crowdsourcing, and we provide a critical baseline before widespread adoption ensues.
Spinal cord segmentation is clinically relevant and is notably used to compute spinal cord cross-sectional area (CSA) for the diagnosis and monitoring of cord compression or neurodegenerative diseases such as multiple sclerosis. While several semi and automatic methods exist, one key limitation remains: the segmentation depends on the MRI contrast, resulting in different CSA across contrasts. This is partly due to the varying appearance of the boundary between the spinal cord and the cerebrospinal fluid that depends on the sequence and acquisition parameters. This contrast-sensitive CSA adds variability in multi-center studies where protocols can vary, reducing the sensitivity to detect subtle atrophies. Moreover, existing methods enhance the CSA variability by training one model per contrast, while also producing binary masks that do not account for partial volume effects. In this work, we present a deep learning-based method that produces soft segmentations of the spinal cord. Using the Spine Generic Public Database of healthy participants ($\text{n}=267$; $\text{contrasts}=6$), we first generated participant-wise soft ground truth (GT) by averaging the binary segmentations across all 6 contrasts. These soft GT, along with a regression-based loss function, were then used to train a UNet model for spinal cord segmentation. We evaluated our model against state-of-the-art methods and performed ablation studies involving different GT mask types, loss functions, and contrast-specific models. Our results show that using the soft average segmentations along with a regression loss function reduces CSA variability ($p < 0.05$, Wilcoxon signed-rank test). The proposed spinal cord segmentation model generalizes better than the state-of-the-art contrast-specific methods amongst unseen datasets, vendors, contrasts, and pathologies (compression, lesions), while accounting for partial volume effects.
Software design debt aims to elucidate the rectification attempts of the present design flaws and studies the influence of those to the cost and time of the software. Design smells are a key cause of incurring design debt. Although the impact of design smells on design debt have been predominantly considered in current literature, how design smells are caused due to not following software engineering best practices require more exploration. This research provides a tool which is used for design smell detection in Java software by analyzing large volume of source codes. More specifically, 409,539 Lines of Code (LoC) and 17,760 class files of open source Java software are analyzed here. Obtained results show desirable precision values ranging from 81.01\% to 93.43\%. Based on the output of the tool, a study is conducted to relate the cause of the detected design smells to two software engineering challenges namely "irregular team meetings" and "scope creep". As a result, the gained information will provide insight to the software engineers to take necessary steps of design remediation actions.
Increasing individuals' awareness of their own body signals can lead to improved interoception, enabling the brain to estimate current body states more accurately and in a timely manner. However, certain body signals, such as eye movements, often go unnoticed by individuals themselves. This study aimed to test the hypothesis that providing eye-movement-correlated tactile feedback on the body enhances individuals' awareness of their attentive states, subsequently improving attention. Our results demonstrate the effectiveness of such feedback in redirecting and enhancing attention, particularly in the presence of distractions during long-duration tasks. Additionally, we observed that people's gaze behaviors changed in response to the tactile feedback, suggesting an increased self-awareness of current eye movements and attentive states. Ultimately, these changes in gaze behaviors contribute to the modulation of attentive states. Our findings highlight the potential of eye-movement-correlated bodily tactile feedback to increase individuals' self-awareness of their eye movements and attentive states. By providing real-time feedback through tactile stimuli, we can actively engage individuals in regulating their attention and enhancing their overall performance.
We observe a large variety of robots in terms of their bodies, sensors, and actuators. Given the commonalities in the skill sets, teaching each skill to each different robot independently is inefficient and not scalable when the large variety in the robotic landscape is considered. If we can learn the correspondences between the sensorimotor spaces of different robots, we can expect a skill that is learned in one robot can be more directly and easily transferred to the other robots. In this paper, we propose a method to learn correspondences between robots that have significant differences in their morphologies: a fixed-based manipulator robot with joint control and a differential drive mobile robot. For this, both robots are first given demonstrations that achieve the same tasks. A common latent representation is formed while learning the corresponding policies. After this initial learning stage, the observation of a new task execution by one robot becomes sufficient to generate a latent space representation pertaining to the other robot to achieve the same task. We verified our system in a set of experiments where the correspondence between two simulated robots is learned (1) when the robots need to follow the same paths to achieve the same task, (2) when the robots need to follow different trajectories to achieve the same task, and (3) when complexities of the required sensorimotor trajectories are different for the robots considered. We also provide a proof-of-the-concept realization of correspondence learning between a real manipulator robot and a simulated mobile robot.
A new mechanical model on noncircular shallow tunnelling considering initial stress field is proposed in this paper by constraining far-field ground surface to eliminate displacement singularity at infinity, and the originally unbalanced tunnel excavation problem in existing solutions is turned to an equilibrium one of mixed boundaries. By applying analytic continuation, the mixed boundaries are transformed to a homogenerous Riemann-Hilbert problem, which is subsequently solved via an efficient and accurate iterative method with boundary conditions of static equilibrium, displacement single-valuedness, and traction along tunnel periphery. The Lanczos filtering technique is used in the final stress and displacement solution to reduce the Gibbs phenomena caused by the constrained far-field ground surface for more accurte results. Several numerical cases are conducted to intensively verify the proposed solution by examining boundary conditions and comparing with existing solutions, and all the results are in good agreements. Then more numerical cases are conducted to investigate the stress and deformation distribution along ground surface and tunnel periphery, and several engineering advices are given. Further discussions on the defects of the proposed solution are also conducted for objectivity.
Graph-based two-sample tests and graph-based change-point detection that utilize a similarity graph provide a powerful tool for analyzing high-dimensional and non-Euclidean data as these methods do not impose distributional assumptions on data and have good performance across various scenarios. Current graph-based tests that deliver efficacy across a broad spectrum of alternatives typically reply on the $K$-nearest neighbor graph or the $K$-minimum spanning tree. However, these graphs can be vulnerable for high-dimensional data due to the curse of dimensionality. To mitigate this issue, we propose to use a robust graph that is considerably less influenced by the curse of dimensionality. We also establish a theoretical foundation for graph-based methods utilizing this proposed robust graph and demonstrate its consistency under fixed alternatives for both low-dimensional and high-dimensional data.
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work has been addressing speaker diarization as a frame-wise multi-label classification problem with permutation-invariant training. Despite EEND showing great promise, a few recent works took a step back and studied the possible combination of (local) supervised EEND diarization with (global) unsupervised clustering. Yet, these hybrid contributions did not question the original multi-label formulation. We propose to switch from multi-label (where any two speakers can be active at the same time) to powerset multi-class classification (where dedicated classes are assigned to pairs of overlapping speakers). Through extensive experiments on 9 different benchmarks, we show that this formulation leads to significantly better performance (mostly on overlapping speech) and robustness to domain mismatch, while eliminating the detection threshold hyperparameter, critical for the multi-label formulation.