The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
We study the problem of enumerating Tarski fixed points, focusing on the relational lattices of equivalences, quasiorders and binary relations. We present a polynomial space enumeration algorithm for Tarski fixed points on these lattices and other lattices of polynomial height. It achieves polynomial delay when enumerating fixed points of increasing isotone maps on all three lattices, as well as decreasing isotone maps on the lattice of binary relations. In those cases in which the enumeration algorithm does not guarantee polynomial delay on the three relational lattices on the other hand, we prove exponential lower bounds for deciding the existence of three fixed points when the isotone map is given as an oracle, and that it is NP-hard to find three or more Tarski fixed points. More generally, we show that any deterministic or bounded-error randomized algorithm must perform a number of queries asymptotically at least as large as the lattice width to decide the existence of three fixed points when the isotone map is given as an oracle. Finally, we demonstrate that our findings yield a polynomial delay and space algorithm for listing bisimulations and instances of some related models of behavioral or role equivalence.
Collaborative filtering (CF) has become a popular method for developing recommender systems (RSs) where ratings of a user for new items are predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for rating predictions, which has not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.
We discuss the problem of estimating Radon-Nikodym derivatives. This problem appears in various applications, such as covariate shift adaptation, likelihood-ratio testing, mutual information estimation, and conditional probability estimation. To address the above problem, we employ the general regularization scheme in reproducing kernel Hilbert spaces. The convergence rate of the corresponding regularized algorithm is established by taking into account both the smoothness of the derivative and the capacity of the space in which it is estimated. This is done in terms of general source conditions and the regularized Christoffel functions. We also find that the reconstruction of Radon-Nikodym derivatives at any particular point can be done with high order of accuracy. Our theoretical results are illustrated by numerical simulations.
Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speaking style guidance from a reference auditory information. Nevertheless, it is not always possible to obtain the audio information from the corresponding video input, especially during the inference time. In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time. Using the extracted vision-guided speaker embedding representations, we further develop a diffusion-based video-to-speech synthesis model, so called DiffV2S, conditioned on those speaker embeddings and the visual representation extracted from the input video. The proposed DiffV2S not only maintains phoneme details contained in the input video frames, but also creates a highly intelligible mel-spectrogram in which the speaker identities of the multiple speakers are all preserved. Our experimental results show that DiffV2S achieves the state-of-the-art performance compared to the previous video-to-speech synthesis technique.
Bayesian linear mixed-effects models and Bayesian ANOVA are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can - if the sphericity assumption is violated - likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian linear mixed-effects models on non-aggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from \url{//osf.io/mjf47/}.
The work of Kalman and Bucy has established a duality between filtering and optimal estimation in the context of time-continuous linear systems. This duality has recently been extended to time-continuous nonlinear systems in terms of an optimization problem constrained by a backward stochastic partial differential equation. Here we revisit this problem from the perspective of appropriate forward-backward stochastic differential equations. This approach sheds new light on the estimation problem and provides a unifying perspective. It is also demonstrated that certain formulations of the estimation problem lead to deterministic formulations similar to the linear Gaussian case as originally investigated by Kalman and Bucy. Finally, optimal control of partially observed diffusion processes is discussed as an application of the proposed estimators.
The dual consistency is an important issue in developing stable DWR error estimation towards the goal-oriented mesh adaptivity. In this paper, such an issue is studied in depth based on a Newton-GMG framework for the steady Euler equations. Theoretically, the numerical framework is redescribed using the Petrov-Galerkin scheme, based on which the dual consistency is depicted. A boundary modification technique is discussed for preserving the dual consistency within the Newton-GMG framework. Numerically, a geometrical multigrid is proposed for solving the dual problem, and a regularization term is designed to guarantee the convergence of the iteration. The following features of our method can be observed from numerical experiments, i). a stable numerical convergence of the quantity of interest can be obtained smoothly for problems with different configurations, and ii). towards accurate calculation of quantity of interest, mesh grids can be saved significantly using the proposed dual-consistent DWR method, compared with the dual-inconsistent one.
In this work we present a new WENO b-spline based quasi-interpolation algorithm. The novelty of this construction resides in the application of the WENO weights to the b-spline functions, that are a partition of unity, instead to the coefficients that multiply the b-spline functions of the spline. The result obtained conserves the smoothness of the original spline and presents adaption to discontinuities in the function. Another new idea that we introduce in this work is the use of different base weight functions from those proposed in classical WENO algorithms. Apart from introducing the construction of the new algorithms, we present theoretical results regarding the order of accuracy obtained at smooth zones and close to the discontinuity, as well as theoretical considerations about how to design the new weight functions. Through a tensor product strategy, we extend our results to several dimensions. In order to check the theoretical results obtained, we present an extended battery of numerical experiments in one, two and tree dimensions that support our conclussions.
A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into multiple separated clusters, and multiple classes can be merged into a single cluster. We thus cannot always assure the credibility of the evaluation using class labels. In this paper, we introduce two novel quality measures -- Label-Trustworthiness and Label-Continuity (Label-T&C) -- advancing the process of DR evaluation based on class labels. Instead of assuming that classes are well-clustered in the original space, Label-T&C work by (1) estimating the extent to which classes form clusters in the original and embedded spaces and (2) evaluating the difference between the two. A quantitative evaluation showed that Label-T&C outperform widely used DR evaluation measures (e.g., Trustworthiness and Continuity, Kullback-Leibler divergence) in terms of the accuracy in assessing how well DR embeddings preserve the cluster structure, and are also scalable. Moreover, we present case studies demonstrating that Label-T&C can be successfully used for revealing the intrinsic characteristics of DR techniques and their hyperparameters.
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.