Deep learning-based edge detectors heavily rely on pixel-wise labels which are often provided by multiple annotators. Existing methods fuse multiple annotations using a simple voting process, ignoring the inherent ambiguity of edges and labeling bias of annotators. In this paper, we propose a novel uncertainty-aware edge detector (UAED), which employs uncertainty to investigate the subjectivity and ambiguity of diverse annotations. Specifically, we first convert the deterministic label space into a learnable Gaussian distribution, whose variance measures the degree of ambiguity among different annotations. Then we regard the learned variance as the estimated uncertainty of the predicted edge maps, and pixels with higher uncertainty are likely to be hard samples for edge detection. Therefore we design an adaptive weighting loss to emphasize the learning from those pixels with high uncertainty, which helps the network to gradually concentrate on the important pixels. UAED can be combined with various encoder-decoder backbones, and the extensive experiments demonstrate that UAED achieves superior performance consistently across multiple edge detection benchmarks. The source code is available at \url{//github.com/ZhouCX117/UAED}
The use of deep learning approaches for image reconstruction is of contemporary interest in radiology, especially for approaches that solve inverse problems associated with imaging. In deployment, these models may be exposed to input distributions that are widely shifted from training data, due in part to data biases or drifts. We propose a metric based on local Lipschitz determined from a single trained model that can be used to estimate the model uncertainty for image reconstructions. We demonstrate a monotonic relationship between the local Lipschitz value and Mean Absolute Error and show that this method can be used to provide a threshold that determines whether a given DL reconstruction approach was well suited to the task. Our uncertainty estimation method can be used to identify out-of-distribution test samples, relate information regarding epistemic uncertainties, and guide proper data augmentation. Quantifying uncertainty of learned reconstruction approaches is especially pertinent to the medical domain where reconstructed images must remain diagnostically accurate.
Depth perception is considered an invaluable source of information in the context of 3D mapping and various robotics applications. However, point cloud maps acquired using consumer-level light detection and ranging sensors (lidars) still suffer from bias related to local surface properties such as measuring beam-to-surface incidence angle, distance, texture, reflectance, or illumination conditions. This fact has recently motivated researchers to exploit traditional filters, as well as the deep learning paradigm, in order to suppress the aforementioned depth sensors error while preserving geometric and map consistency details. Despite the effort, depth correction of lidar measurements is still an open challenge mainly due to the lack of clean 3D data that could be used as ground truth. In this paper, we introduce two novel point cloud map consistency losses, which facilitate self-supervised learning on real data of lidar depth correction models. Specifically, the models exploit multiple point cloud measurements of the same scene from different view-points in order to learn to reduce the bias based on the constructed map consistency signal. Complementary to the removal of the bias from the measurements, we demonstrate that the depth correction models help to reduce localization drift. Additionally, we release a data set that contains point cloud data captured in an indoor corridor environment with precise localization and ground truth mapping information.
Wildfire propagation is a highly stochastic process where small changes in environmental conditions (such as wind speed and direction) can lead to large changes in observed behaviour. A traditional approach to quantify uncertainty in fire-front progression is to generate probability maps via ensembles of simulations. However, use of ensembles is typically computationally expensive, which can limit the scope of uncertainty analysis. To address this, we explore the use of a spatio-temporal neural-based modelling approach to directly estimate the likelihood of fire propagation given uncertainty in input parameters. The uncertainty is represented by deliberately perturbing the input weather forecast during model training. The computational load is concentrated in the model training process, which allows larger probability spaces to be explored during deployment. Empirical evaluations indicate that the proposed model achieves comparable fire boundaries to those produced by the traditional SPARK simulation platform, with an overall Jaccard index (similarity score) of 67.4% on a set of 35 simulated fires. When compared to a related neural model (emulator) which was employed to generate probability maps via ensembles of emulated fires, the proposed approach produces competitive Jaccard similarity scores while being approximately an order of magnitude faster.
This paper introduces a novel Bayesian approach to detect changes in the variance of a Gaussian sequence model, focusing on quantifying the uncertainty in the change point locations and providing a scalable algorithm for inference. Such a measure of uncertainty is necessary when change point methods are deployed in sensitive applications, for example, when one is interested in determining whether an organ is viable for transplant. The key of our proposal is framing the problem as a product of multiple single changes in the scale parameter. We fit the model through an iterative procedure similar to what is done for additive models. The novelty is that each iteration returns a probability distribution on time instances, which captures the uncertainty in the change point location. Leveraging a recent result in the literature, we can show that our proposal is a variational approximation of the exact model posterior distribution. We study the algorithm's convergence and the change point localization rate. Extensive experiments in simulation studies illustrate the performance of our method and the possibility of generalizing it to more complex data-generating mechanisms. We apply the new model to an experiment involving a novel technique to assess the viability of a liver and oceanographic data.
High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting computes empirical risk on a holdout set and halts once (or flags that/when) it begins to increase. Such practice often helps in outputting a well-generalizing model, but justification for why it works is primarily heuristic. We discuss the overfitting problem and explain why standard asymptotic and concentration results do not hold for evaluation with training data. We then proceed to introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data, and overfitting quantitatively defined and detected. We rely on said concentration bounds which guarantee that empirical means should, with high probability, approximate their true mean to conclude that they should approximate each other. We stipulate conditions under which this test is valid, describe how the test may be used for identifying overfitting, articulate a further nuance according to which distributional shift may be flagged, and highlight an alternative notion of learning which usefully captures generalization in the absence of uniform PAC guarantees.
An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS). To save computational cost, most of existing NAS algorithms often train and evaluate intermediate neural architectures on a small proxy dataset with limited training epochs. But it is difficult to expect an accurate performance estimation of an architecture in such a coarse evaluation way. This paper advocates a new neural architecture evaluation scheme, which aims to determine which architecture would perform better instead of accurately predict the absolute architecture performance. Therefore, we propose a \textbf{relativistic} architecture performance predictor in NAS (ReNAS). We encode neural architectures into feature tensors, and further refining the representations with the predictor. The proposed relativistic performance predictor can be deployed in discrete searching methods to search for the desired architectures without additional evaluation. Experimental results on NAS-Bench-101 dataset suggests that, sampling 424 ($0.1\%$ of the entire search space) neural architectures and their corresponding validation performance is already enough for learning an accurate architecture performance predictor. The accuracies of our searched neural architectures on NAS-Bench-101 and NAS-Bench-201 datasets are higher than that of the state-of-the-art methods and show the priority of the proposed method.
A key requirement for the success of supervised deep learning is a large labeled dataset - a condition that is difficult to meet in medical image analysis. Self-supervised learning (SSL) can help in this regard by providing a strategy to pre-train a neural network with unlabeled data, followed by fine-tuning for a downstream task with limited annotations. Contrastive learning, a particular variant of SSL, is a powerful technique for learning image-level representations. In this work, we propose strategies for extending the contrastive learning framework for segmentation of volumetric medical images in the semi-supervised setting with limited annotations, by leveraging domain-specific and problem-specific cues. Specifically, we propose (1) novel contrasting strategies that leverage structural similarity across volumetric medical images (domain-specific cue) and (2) a local version of the contrastive loss to learn distinctive representations of local regions that are useful for per-pixel segmentation (problem-specific cue). We carry out an extensive evaluation on three Magnetic Resonance Imaging (MRI) datasets. In the limited annotation setting, the proposed method yields substantial improvements compared to other self-supervision and semi-supervised learning techniques. When combined with a simple data augmentation technique, the proposed method reaches within 8% of benchmark performance using only two labeled MRI volumes for training, corresponding to only 4% (for ACDC) of the training data used to train the benchmark.
User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.
Deep neural network architectures have traditionally been designed and explored with human expertise in a long-lasting trial-and-error process. This process requires huge amount of time, expertise, and resources. To address this tedious problem, we propose a novel algorithm to optimally find hyperparameters of a deep network architecture automatically. We specifically focus on designing neural architectures for medical image segmentation task. Our proposed method is based on a policy gradient reinforcement learning for which the reward function is assigned a segmentation evaluation utility (i.e., dice index). We show the efficacy of the proposed method with its low computational cost in comparison with the state-of-the-art medical image segmentation networks. We also present a new architecture design, a densely connected encoder-decoder CNN, as a strong baseline architecture to apply the proposed hyperparameter search algorithm. We apply the proposed algorithm to each layer of the baseline architectures. As an application, we train the proposed system on cine cardiac MR images from Automated Cardiac Diagnosis Challenge (ACDC) MICCAI 2017. Starting from a baseline segmentation architecture, the resulting network architecture obtains the state-of-the-art results in accuracy without performing any trial-and-error based architecture design approaches or close supervision of the hyperparameters changes.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.