Defect detection aims to detect and localize regions out of the normal distribution.Previous approaches model normality and compare it with the input to identify defective regions, potentially limiting their generalizability.This paper proposes a one-stage framework that detects defective patterns directly without the modeling process.This ability is adopted through the joint efforts of three parties: a generative adversarial network (GAN), a newly proposed scaled pattern loss, and a dynamic masked cycle-consistent auxiliary network. Explicit information that could indicate the position of defects is intentionally excluded to avoid learning any direct mapping.Experimental results on the texture class of the challenging MVTec AD dataset show that the proposed method is 2.9% higher than the SOTA methods in F1-Score, while substantially outperforming SOTA methods in generalizability.
Gibbs posteriors are proportional to a prior distribution multiplied by an exponentiated loss function, with a key tuning parameter weighting information in the loss relative to the prior and providing a control of posterior uncertainty. Gibbs posteriors provide a principled framework for likelihood-free Bayesian inference, but in many situations, including a single tuning parameter inevitably leads to poor uncertainty quantification. In particular, regardless of the value of the parameter, credible regions have far from the nominal frequentist coverage even in large samples. We propose a sequential extension to Gibbs posteriors to address this problem. We prove the proposed sequential posterior exhibits concentration and a Bernstein-von Mises theorem, which holds under easy to verify conditions in Euclidean space and on manifolds. As a byproduct, we obtain the first Bernstein-von Mises theorem for traditional likelihood-based Bayesian posteriors on manifolds. All methods are illustrated with an application to principal component analysis.
Pretrained language models are expected to effectively map input text to a set of vectors while preserving the inherent relationships within the text. Consequently, designing a white-box model to compute metrics that reflect the presence of specific internal relations in these vectors has become a common approach for post-hoc interpretability analysis of pretrained language models. However, achieving interpretability in white-box models and ensuring the rigor of metric computation becomes challenging when the source model lacks inherent interpretability. Therefore, in this paper, we discuss striking a balance in this trade-off and propose a novel line to constructing metrics for understanding the mechanisms of pretrained language models. We have specifically designed a family of metrics along this line of investigation, and the model used to compute these metrics is referred to as the tree topological probe. We conducted measurements on BERT-large by using these metrics. Based on the experimental results, we propose a speculation regarding the working mechanism of BERT-like pretrained language models, as well as a strategy for enhancing fine-tuning performance by leveraging the topological probe to improve specific submodules.
We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
We propose tests of fit for classes of distributions that include the Weibull, the Pareto and the Fr\'echet, distributions. The new tests employ the novel tool of the min--characteristic function and are based on an L2--type weighted distance between this function and its empirical counterpart applied on suitably standardized data. If data--standardization is performed using the MLE of the distributional parameters then the method reduces to testing for the standard member of the family, with parameter values known and set equal to one. We investigate asymptotic properties of the tests, while a Monte Carlo study is presented that includes the new procedure as well as competitors for the purpose of specification testing with three extreme value distributions. The new tests are also applied on a few real--data sets.
Behavioural metrics provide a quantitative refinement of classical two-valued behavioural equivalences on systems with quantitative data, such as metric or probabilistic transition systems. In analogy to the linear-time/branching-time spectrum of two-valued behavioural equivalences on transition systems, behavioural metrics vary in granularity. We provide a unifying treatment of spectra of behavioural metrics in the emerging framework of graded monads, working in coalgebraic generality, that is, parametrically in the system type. In the ensuing development of quantitative graded semantics, we introduce algebraic presentations of graded monads on the category of metric spaces. Moreover, we obtain a canonical generic notion of invariant real-valued modal logic, and provide criteria for such logics to be expressive in the sense that logical distance coincides with behavioural distance. We present positive examples based on this criterion, covering both known and new expressiveness results; in particular, we show that expressiveness holds essentially always for Eilenberg-Moore type trace semantics, and we obtain a new expressiveness result for trace semantics of fuzzy transition systems. As a negative result, we show that trace distance on probabilistic metric transition systems does not admit any characteristic real-valued modal logic, even in a more broadly understood sense.
Explainable recommender systems (RS) have traditionally followed a one-size-fits-all approach, delivering the same explanation level of detail to each user, without considering their individual needs and goals. Further, explanations in RS have so far been presented mostly in a static and non-interactive manner. To fill these research gaps, we aim in this paper to adopt a user-centered, interactive explanation model that provides explanations with different levels of detail and empowers users to interact with, control, and personalize the explanations based on their needs and preferences. We followed a user-centered approach to design interactive explanations with three levels of detail (basic, intermediate, and advanced) and implemented them in the transparent Recommendation and Interest Modeling Application (RIMA). We conducted a qualitative user study (N=14) to investigate the impact of providing interactive explanations with varying level of details on the users' perception of the explainable RS. Our study showed qualitative evidence that fostering interaction and giving users control in deciding which explanation they would like to see can meet the demands of users with different needs, preferences, and goals, and consequently can have positive effects on different crucial aspects in explainable recommendation, including transparency, trust, satisfaction, and user experience.
We investigate the randomized and quantum communication complexities of the well-studied Equality function with small error probability $\epsilon$, getting optimal constant factors in the leading terms in a number of different models. In the randomized model, 1) we give a general technique to convert public-coin protocols to private-coin protocols by incurring a small multiplicative error, at a small additive cost. This is an improvement over Newman's theorem [Inf. Proc. Let.'91] in the dependence on the error parameter. 2) Using this we obtain a $(\log(n/\epsilon^2)+4)$-cost private-coin communication protocol that computes the $n$-bit Equality function, to error $\epsilon$. This improves upon the $\log(n/\epsilon^3)+O(1)$ upper bound implied by Newman's theorem, and matches the best known lower bound, which follows from Alon [Comb. Prob. Comput.'09], up to an additive $\log\log(1/\epsilon)+O(1)$. In the quantum model, 1) we exhibit a one-way protocol of cost $\log(n/\epsilon)+4$, that uses only pure states and computes the $n$-bit Equality function to error $\epsilon$. This bound was implicitly already shown by Nayak [PhD thesis'99]. 2) We show that any $\epsilon$-error one-way protocol for $n$-bit Equality that uses only pure states communicates at least $\log(n/\epsilon)-\log\log(1/\epsilon)-O(1)$ qubits. 3) We exhibit a one-way protocol of cost $\log(\sqrt{n}/\epsilon)+3$, that uses mixed states and computes the $n$-bit Equality function to error $\epsilon$. This is also tight up to an additive $\log\log(1/\epsilon)+O(1)$, which follows from Alon's result. 4) We study the number of EPR pairs required to be shared in an entanglement-assisted one-way protocol. Our upper bounds also yield upper bounds on the approximate rank and related measures of the Identity matrix.
Graphs are used widely to model complex systems, and detecting anomalies in a graph is an important task in the analysis of complex systems. Graph anomalies are patterns in a graph that do not conform to normal patterns expected of the attributes and/or structures of the graph. In recent years, graph neural networks (GNNs) have been studied extensively and have successfully performed difficult machine learning tasks in node classification, link prediction, and graph classification thanks to the highly expressive capability via message passing in effectively learning graph representations. To solve the graph anomaly detection problem, GNN-based methods leverage information about the graph attributes (or features) and/or structures to learn to score anomalies appropriately. In this survey, we review the recent advances made in detecting graph anomalies using GNN models. Specifically, we summarize GNN-based methods according to the graph type (i.e., static and dynamic), the anomaly type (i.e., node, edge, subgraph, and whole graph), and the network architecture (e.g., graph autoencoder, graph convolutional network). To the best of our knowledge, this survey is the first comprehensive review of graph anomaly detection methods based on GNNs.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.