Detecting out-of-distribution (OOD) inputs during the inference stage is crucial for deploying neural networks in the real world. Previous methods commonly relied on the output of a network derived from the highly activated feature map. In this study, we first revealed that a norm of the feature map obtained from the other block than the last block can be a better indicator of OOD detection. Motivated by this, we propose a simple framework consisting of FeatureNorm: a norm of the feature map and NormRatio: a ratio of FeatureNorm for ID and OOD to measure the OOD detection performance of each block. In particular, to select the block that provides the largest difference between FeatureNorm of ID and FeatureNorm of OOD, we create Jigsaw puzzle images as pseudo OOD from ID training samples and calculate NormRatio, and the block with the largest value is selected. After the suitable block is selected, OOD detection with the FeatureNorm outperforms other OOD detection methods by reducing FPR95 by up to 52.77% on CIFAR10 benchmark and by up to 48.53% on ImageNet benchmark. We demonstrate that our framework can generalize to various architectures and the importance of block selection, which can improve previous OOD detection methods as well.
The effectiveness of machine learning models is significantly affected by the size of the dataset and the quality of features as redundant and irrelevant features can radically degrade the performance. This paper proposes IGRF-RFE: a hybrid feature selection method tasked for multi-class network anomalies using a Multilayer perceptron (MLP) network. IGRF-RFE can be considered as a feature reduction technique based on both the filter feature selection method and the wrapper feature selection method. In our proposed method, we use the filter feature selection method, which is the combination of Information Gain and Random Forest Importance, to reduce the feature subset search space. Then, we apply recursive feature elimination(RFE) as a wrapper feature selection method to further eliminate redundant features recursively on the reduced feature subsets. Our experimental results obtained based on the UNSW-NB15 dataset confirm that our proposed method can improve the accuracy of anomaly detection while reducing the feature dimension. The results show that the feature dimension is reduced from 42 to 23 while the multi-classification accuracy of MLP is improved from 82.25% to 84.24%.
Physics informed neural networks (PINNs) have proven to be an efficient tool to represent problems for which measured data are available and for which the dynamics in the data are expected to follow some physical laws. In this paper, we suggest a multiobjective perspective on the training of PINNs by treating the data loss and the residual loss as two individual objective functions in a truly biobjective optimization approach. As a showcase example, we consider COVID-19 predictions in Germany and built an extended susceptibles-infected-recovered (SIR) model with additionally considered leaky-vaccinated and hospitalized populations (SVIHR model) to model the transition rates and to predict future infections. SIR-type models are expressed by systems of ordinary differential equations (ODEs). We investigate the suitability of the generated PINN for COVID-19 predictions and compare the resulting predicted curves with those obtained by applying the method of non-standard finite differences to the system of ODEs and initial data. The approach is applicable to various systems of ODEs that define dynamical regimes. Those regimes do not need to be SIR-type models, and the corresponding underlying data sets do not have to be associated with COVID-19.
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection and evaluation are strictly separated endeavors, splitting the sample at hand into a training, validation, and evaluation set, and only compute a single confidence interval for the prediction performance of the final selected model. We however propose an algorithm how to compute valid lower confidence bounds for multiple models that have been selected based on their prediction performances in the evaluation set by interpreting the selection problem as a simultaneous inference problem. We use bootstrap tilting and a maxT-type multiplicity correction. The approach is universally applicable for any combination of prediction models, any model selection strategy, and any prediction performance measure that accepts weights. We conducted various simulation experiments which show that our proposed approach yields lower confidence bounds that are at least comparably good as bounds from standard approaches, and that reliably reach the nominal coverage probability. In addition, especially when sample size is small, our proposed approach yields better performing prediction models than the default selection of only one model for evaluation does.
There exist growing interests in intelligent systems for numerous medical imaging, image processing, and computer vision applications, such as face recognition, medical diagnosis, character recognition, and self-driving cars, among others. These applications usually require solving complex classification problems involving complex images with unknown data generative processes. In addition to recent successes of the current classification approaches relying on feature engineering and deep learning, several shortcomings of them, such as the lack of robustness, generalizability, and interpretability, have also been observed. These methods often require extensive training data, are computationally expensive, and are vulnerable to out-of-distribution samples, e.g., adversarial attacks. Recently, an accurate, data-efficient, computationally efficient, and robust transport-based classification approach has been proposed, which describes a generative model-based problem formulation and closed-form solution for a specific category of classification problems. However, all these approaches lack mechanisms to detect test samples outside the class distributions used during training. In real-world settings, where the collected training samples are unable to exhaust or cover all classes, the traditional classification schemes are unable to handle the unseen classes effectively, which is especially an important issue for safety-critical systems, such as self-driving and medical imaging diagnosis. In this work, we propose a method for detecting out-of-class distributions based on the distribution of sliced-Wasserstein distance from the Radon Cumulative Distribution Transform (R-CDT) subspace. We tested our method on the MNIST and two medical image datasets and reported better accuracy than the state-of-the-art methods without an out-of-class distribution detection procedure.
Frequent false alarms impede the promotion of unsupervised anomaly detection algorithms in industrial applications. Potential characteristics of false alarms depending on the trained detector are revealed by investigating density probability distributions of prediction scores in the out-of-distribution anomaly detection tasks. An SVM-based classifier is exploited as a post-processing module to identify false alarms from the anomaly map at the object level. Besides, a sample synthesis strategy is devised to incorporate fuzzy prior knowledge on the specific application in the anomaly-free training dataset. Experimental results illustrate that the proposed method comprehensively improves the performances of two segmentation models at both image and pixel levels on two industrial applications.
We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task improves before deteriorating beyond a threshold. In other words, there is value in training on small amounts of OOD data. We use Fisher's Linear Discriminant on synthetic datasets and deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate and analyze this phenomenon. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization, and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset.
In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization. Prior art has tackled each of these components individually, purportedly aiming to alleviate difficulties in effectively train a holistic network. We design a self-supervised image warping correspondence loss for both feature detection and matching, a weakly-supervised epipolar constraints loss on relative camera pose learning, and a directional matching scheme that detects key-point features in a source image and performs coarse-to-fine correspondence search on the target image. We leverage this framework to enforce cycle consistency in our matching module. In addition, we propose a new loss to robustly handle both definite inlier/outlier matches and less-certain matches. The integration of these learning mechanisms enables end-to-end training of a single network performing all three localization components. Bench-marking our approach on public data-sets, exemplifies how such an end-to-end framework is able to yield more accurate localization that out-performs both traditional methods as well as state-of-the-art weakly supervised methods.
Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).
It is a common paradigm in object detection frameworks to treat all samples equally and target at maximizing the performance on average. In this work, we revisit this paradigm through a careful study on how different samples contribute to the overall performance measured in terms of mAP. Our study suggests that the samples in each mini-batch are neither independent nor equally important, and therefore a better classifier on average does not necessarily mean higher mAP. Motivated by this study, we propose the notion of Prime Samples, those that play a key role in driving the detection performance. We further develop a simple yet effective sampling and learning strategy called PrIme Sample Attention (PISA) that directs the focus of the training process towards such samples. Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector. Particularly, On the MSCOCO dataset, PISA outperforms the random sampling baseline and hard mining schemes, e.g. OHEM and Focal Loss, consistently by more than 1% on both single-stage and two-stage detectors, with a strong backbone ResNeXt-101.