Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.
Deep Learning (DL) models tend to perform poorly when the data comes from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection helps to identify such data samples, increasing the model's reliability. Recent works have developed DL-based OOD detection that achieves promising results on 2D medical images. However, scaling most of these approaches on 3D images is computationally intractable. Furthermore, the current 3D solutions struggle to achieve acceptable results in detecting even synthetic OOD samples. Such limited performance might indicate that DL often inefficiently embeds large volumetric images. We argue that using the intensity histogram of the original CT or MRI scan as embedding is descriptive enough to run OOD detection. Therefore, we propose a histogram-based method that requires no DL and achieves almost perfect results in this domain. Our proposal is supported two-fold. We evaluate the performance on the publicly available datasets, where our method scores 1.0 AUROC in most setups. And we score second in the Medical Out-of-Distribution challenge without fine-tuning and exploiting task-specific knowledge. Carefully discussing the limitations, we conclude that our method solves the sample-level OOD detection on 3D medical images in the current setting.
Polynomials are common algebraic structures, which are often used to approximate functions including probability distributions. This paper proposes to directly define polynomial distributions in order to describe stochastic properties of systems rather than to assume polynomials for only approximating known or empirically estimated distributions. Polynomial distributions offer a great modeling flexibility, and often, also mathematical tractability. However, unlike canonical distributions, polynomial functions may have non-negative values in the interval of support for some parameter values, the number of their parameters is usually much larger than for canonical distributions, and the interval of support must be finite. In particular, polynomial distributions are defined here assuming three forms of polynomial function. The transformation of polynomial distributions and fitting a histogram to a polynomial distribution are considered. The key properties of polynomial distributions are derived in closed-form. A piecewise polynomial distribution construction is devised to ensure that it is non-negative over the support interval. Finally, the problems of estimating parameters of polynomial distributions and generating polynomially distributed samples are also studied.
The field of adversarial textual attack has significantly grown over the last few years, where the commonly considered objective is to craft adversarial examples (AEs) that can successfully fool the target model. However, the imperceptibility of attacks, which is also essential for practical attackers, is often left out by previous studies. In consequence, the crafted AEs tend to have obvious structural and semantic differences from the original human-written texts, making them easily perceptible. In this work, we advocate leveraging multi-objectivization to address such issue. Specifically, we formulate the problem of crafting AEs as a multi-objective optimization problem, where the imperceptibility of attacks is considered as auxiliary objectives. Then, we propose a simple yet effective evolutionary algorithm, dubbed HydraText, to solve this problem. To the best of our knowledge, HydraText is currently the only approach that can be effectively applied to both score-based and decision-based attack settings. Exhaustive experiments involving 44237 instances demonstrate that HydraText consistently achieves competitive attack success rates and better attack imperceptibility than the recently proposed attack approaches. A human evaluation study also shows that the AEs crafted by HydraText are more indistinguishable from human-written texts. Finally, these AEs exhibit good transferability and can bring notable robustness improvement to the target model by adversarial training.
We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that feature attribution methods are susceptible to erroneously indicating dependence on spurious signals even when the model being explained does not rely on spurious artifacts. This finding casts doubt on the utility of these approaches, in the hands of a practitioner, for detecting a model's reliance on spurious signals.
Contrastive learning has emerged as a competitive pretraining method for object detection. Despite this progress, there has been minimal investigation into the robustness of contrastively pretrained detectors when faced with domain shifts. To address this gap, we conduct an empirical study of contrastive learning and out-of-domain object detection, studying how contrastive view design affects robustness. In particular, we perform a case study of the detection-focused pretext task Instance Localization (InsLoc) and propose strategies to augment views and enhance robustness in appearance-shifted and context-shifted scenarios. Amongst these strategies, we propose changes to cropping such as altering the percentage used, adding IoU constraints, and integrating saliency based object priors. We also explore the addition of shortcut-reducing augmentations such as Poisson blending, texture flattening, and elastic deformation. We benchmark these strategies on abstract, weather, and context domain shifts and illustrate robust ways to combine them, in both pretraining on single-object and multi-object image datasets. Overall, our results and insights show how to ensure robustness through the choice of views in contrastive learning.
We introduce a deep learning model that can universally approximate regular conditional distributions (RCDs). The proposed model operates in three phases: first, it linearizes inputs from a given metric space $\mathcal{X}$ to $\mathbb{R}^d$ via a feature map, then a deep feedforward neural network processes these linearized features, and then the network's outputs are then transformed to the $1$-Wasserstein space $\mathcal{P}_1(\mathbb{R}^D)$ via a probabilistic extension of the attention mechanism of Bahdanau et al.\ (2014). Our model, called the \textit{probabilistic transformer (PT)}, can approximate any continuous function from $\mathbb{R}^d $ to $\mathcal{P}_1(\mathbb{R}^D)$ uniformly on compact sets, quantitatively. We identify two ways in which the PT avoids the curse of dimensionality when approximating $\mathcal{P}_1(\mathbb{R}^D)$-valued functions. The first strategy builds functions in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$ which can be efficiently approximated by a PT, uniformly on any given compact subset of $\mathbb{R}^d$. In the second approach, given any function $f$ in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$, we build compact subsets of $\mathbb{R}^d$ whereon $f$ can be efficiently approximated by a PT.
Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.
Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.
Meta-learning has gained wide popularity as a training framework that is more data-efficient than traditional machine learning methods. However, its generalization ability in complex task distributions, such as multimodal tasks, has not been thoroughly studied. Recently, some studies on multimodality-based meta-learning have emerged. This survey provides a comprehensive overview of the multimodality-based meta-learning landscape in terms of the methodologies and applications. We first formalize the definition of meta-learning and multimodality, along with the research challenges in this growing field, such as how to enrich the input in few-shot or zero-shot scenarios and how to generalize the models to new tasks. We then propose a new taxonomy to systematically discuss typical meta-learning algorithms combined with multimodal tasks. We investigate the contributions of related papers and summarize them by our taxonomy. Finally, we propose potential research directions for this promising field.
Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at //out-of-distribution-generalization.com.